2008年3月26日

mondrian 如何使用xml存储olap服务器的元数据

Mondrian是一个开放源代码的Rolap服务器，使用java开发的。它实现了xmla和jolap规范，而且自定义了一种使用mdx语言的客户端接口。Mondrian是olap服务器，而不是数据仓库服务器，因此Mondrian的元数据主要包括olap建模的元数据，不包括从外部数据源到数据库转换的元数据。也就是说Mondria的元数据仅仅包括了多维逻辑模型，从关系型数据库到多维逻辑模型的映射，存取权限等信息。在功能上，Mondrian支持共享维和成员计算，支持星型模型和雪花模型的功能。
Mondrian中使用物理的xml文件存储元数据，它的设计者规定了xml文件的格式。下面简单介绍一下它是如何存储元数据的。

Element Description
根元素
<Schema> Collection of Cubes, Virtual cubes, Shared dimensions, and Roles.
逻辑元素
<Cube> A collection of dimensions and measures, all centered on a fact table.
<VirtualCube> A cube defined by combining the dimensions and measures of one or more cubes.
<Dimension>
<DimensionUsage> Usage of a shared dimension by a cube.
<Hierarchy>
<Level>
<Property>
<Measure>
物理元素
<Table> Fact- or dimension table.
<View> Defines a 'table' using a SQL query, which can have different variants for different underlying databases.
<Join> Defines a 'table' by joining a set of queries.
存取控制
<Role> An access-control profile.
<SchemaGrant> A set of rights to a schema.
<CubeGrant> A set of rights to a cube.
<HierarchyGrant> A set of rights to a hierarchy and levels within that hierarchy.
<MemberGrant> A set of rights to a member and its children.
其他
<Parameter>
<Table>
<Table>

一个模式定义一个多维数据库，它包括一个逻辑模型，由立方体，层次，成员和逻辑模型到物理模型的映射构成。一个逻辑模型由可以用MDX语言来查询。Mondrain的模型由xml文件来描述。现在创建模式的唯一途径是用文本编辑a器编辑xml文件。Xml的语法不是太复杂，因此没有想象中的那么难。目前正在开发一个图形界面的程序来创建和修改模式。
一个模式最重要的组成部分是立方体，度量和维：在一个主题域中立方体是维和度量的集合。一个度量是一个可测量的数值，比如产品销售的数量或者详细清单的价格
一个维是一个属性或者是属性的集合，通过维你可以将度量划分到字类中。比如：你希望将销售产品按颜色，顾客性别，售出的商店分为八个部分，那么颜色，性别，商店都是维。

下面是一个简单的模型定义的例子：
<Schema>
<Cube name="Sales">
<Table name="sales_fact_1997"/>
<Dimension name="Gender" foreignKey="customer_id">
<Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id">
<Table name="customer"/>
<Level name="Gender" column="gender" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<Dimension name="Time" foreignKey="time_id">
<Hierarchy hasAll="false" primaryKey="time_id">
<Table name="time_by_day"/>
<Level name="Year" column="the_year" type="Numeric"
uniqueMembers="true"/>
<Level name="Quarter" column="quarter"
uniqueMembers="false"/>
<Level name="Month" column="month_of_year" type="Numeric"
uniqueMembers="false"/>
</Hierarchy>
</Dimension>
<Measure name="Unit Sales" column="unit_sales"
aggregator="sum" formatString="#,###"/>
<Measure name="Store Sales" column="store_sales"
aggregator="sum" formatString="#,###.##"/>
</Cube>
</Schema>

这个模型包含了一个销售cube，这个cube有两个维，时间和性别维；两个度量，销售数量和销售总额。
我们可以在这个模型上写一个 MDX 查询:
select {[Measures].[Unit Sales], [Measures].[Store Sales]} on columns,
{[Time].[1997].[Q1].descendants} on rows
from [Sales]
where [Gender].[F]
这个查询涉及到了销售立方体, 每一个维 [Measures], [Time], [Gender], 这些维的多个成员. 结果如下:
[Time] [Measures].[Unit Sales] [Measures].[Store Sales]
[1997].[Q1] 0 0
[1997].[Q1].[Jan] 0 0
[1997].[Q1].[Feb] 0 0
[1997].[Q1].[Mar] 0 0

下面详细地介绍一下模式定义：
一个立方体是一个或者多个维和度量的集合，通常是一个事实表，这里是 ‘sales_fact_1997". 事实表保存了需要计算的列和包含维的参考表.
<Cube name="Sales">
<Table name="sales_fact_1997"/>
...
</Cube>
这里用 <Table> 元素定义事实表. 如果事实表不在默认的模式中, 你可以用"schema"属性指定一个明确地模式,例如：
<Table schema="foodmart" name="sales_fact_1997"/>
你也可以利用 <View> 和 <Join> 结构来创建更复杂的sql .
度量
销售立方体定义了两个维 "Unit Sales" 和 "Store Sales".
<Measure name="Unit Sales" column="unit_sales"
aggregator="sum" formatString="#,###"/>
<Measure name="Store Sales" column="store_sales"
aggregator="sum" formatString="#,###.00"/>
每个度量有一个名字,对应事实表中的一列, 采用一个聚集函数 (usually "sum").
一个可选的格式字符串指定了值如何被打印. 这里我们选择销售数量不带小数的输出（因为销售数量是整数），销售总额带2位小数 . 符号',' 和 '.' 是对地区敏感的, 因此如果是在意大利运行, 销售总额可能会出现 "48.123,45". 你可以用 advanced format strings来实现更严格的效果.度量值不是从列中来的，而是从立方体的单元中来的
维
性别维由单一的层次组成，仅有一层。
<Dimension name="Gender" foreignKey="customer_id">
<Hierarchy hasAll="true" primaryKey="customer_id">
<Table name="customer"/>
<Level name="Gender" column="gender" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
对于任意给定的销售, 性别维是指购买改产品的客户的性别. 它通过连接事实表"sales_fact_1997.customer_id"和维表"customer.customer_id"
来表示。"gender" 包括两个值, 'F' 和 'M', 因此性别维包含的成员： [Gender].[F] and [Gender].[M]. 因为 hasAll="true", 系统产生一个特别的 'all' 层, 仅包括一个成员 [All Genders].
一个维可以包含多个层次:
<Dimension name="Time" foreignKey="time_id">
<Hierarchy hasAll="false" primaryKey="time_id">
<Table name="time_by_day"/>
<Level name="Year" column="the_year" type="Numeric"
uniqueMembers="true"/>
<Level name="Quarter" column="quarter"
uniqueMembers="false"/>
<Level name="Month" column="month_of_year" type="Numeric"
uniqueMembers="false"/>
</Hierarchy>
<Hierarchy name="Time Weekly" hasAll="false" primaryKey="time_id">
<Table name="time_by_week"/>
<Level name="Year" column="the_year" type="Numeric"
uniqueMembers="true"/>
<Level name="Week" column="week"
uniqueMembers="false"/>
<Level name="Day" column="day_of_week" type="String"
uniqueMembers="false"/>
</Hierarchy>
</Dimension>
第一个层次没有指定名称.缺省的情况下,一个层次拥有和它的维相同的名称。，因此第一个层次成为"Time".这些层次没有太多的共同之处，他们甚至没有相同的表，除非它们连接了实施表中的同一列"time_id"。在一个维上存在两个层次的原因是这样对最终用户是有用的. 如果一个维上存在两个层次, MDX会强制不允许在一个查询中同时用到他们.
A dimension can live in the fact table:
<Cube name="Sales">
<Table name="sales_fact_1997"/>
...
<Dimension name="Payment method">
<Hierarchy hasAll="true">
<Level name="Payment method" column="payment_method" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
</Cube>
每个维包含有多层组成的一个层次,

大多数维都是仅有一个层次，但有时候一个维有多个层次。比如：你可能希望在时间维上从天聚集到月，季度和年；或者从天聚集到周和年。这两种层次都是从天到年，但是聚集的路径不同。大多数层次有全成员，全成员包括层次的所有成员，因此能够代表他们的总合。它通常命名为'All something',比如：'All stores'.

星型模式和雪花模式
mondrian支持星型模式和雪花模式。下面介绍一下雪花模式的建模，它需要用到操作符 <Join>.比如:
<Cube name="Sales">
...
<Dimension name="Product" foreignKey="product_id">
<Hierarchy hasAll="true" primaryKey="product_id" primaryKeyTable="product">
<Join leftKey="product_class_id" rightAlias="product_class" rightKey="product_class_id">
<Table name="product"/>
<Join leftKey="product_type_id" rightKey="product_type_id">
<Table name="product_class"/>
<Table name="product_type"/>
</Join>
</Join>
...
</Hierarchy>
</Dimension>
</Cube>
这里定义一个 "Product" 维由三个表构成. 事实表连接表"product" (通过外键 "product_id"),表"product"连接表"product_class" (通过外键 "product_class_id"),表"product_class"连接表 "product_type" (通过外键 "product_type_id"). 我们利用 <Join> 元素的循环嵌套， <Join>带有两个操作对象; 操作对象可能是表，连接或者查询。
按照操作对象行的数目来安排次序，表 "product" 的行数最大, 因此它首先出现连接事实表;然后是表 "product_class"和 "product_type",在雪花的末端拥有的行数最小.
注意外部元素 <Join>有一个属性 rightAlias. 这是必要的，因为join 的右边(是内部元素 <Join> ) 有可能是许多表组成的.这种情况下不需要属性leftAlias,因为列 leftKey 很明确的来自表 "product".

共享维
当为一个连接生成SQL的时候, mondrian 需要知道连接哪一个列. 如果一正在连接一个多表连接, 你需要告诉它连接这些表里的哪一个表，哪一个列.
因为共享维不属于一个cube,你必须给它们一个明确的表 (或者数据源). 当你在一个特别的cube里用他们的时候, 你要指定外键 foreign key. 下面的例子显示了 Store Type 维被连接到 Sales cube ，用了外键 sales_fact_1997.store_id, 并且被连接到Warehouse cube ，用了外键 warehouse.warehouse_store_id :
<Dimension name="Store Type">
<Hierarchy hasAll="true" primaryKey="store_id">
<Table name="store"/>
<Level name="Store Type" column="store_type" uniqueMembers="true"/>
</Hierarchy>
</Dimension>

<Cube name="Sales">
<Table name="sales_fact_1997"/>
...
<DimensionUsage name="Store Type" source="Store Type" foreignKey="store_id"/>
</Cube>

<Cube name="Warehouse">
<Table name="warehouse"/>
...
<DimensionUsage name="Store Type" source="Store Type" foreignKey="warehouse_store_id"/>
</Cube>

虚拟 cubes
父子层次
一个使用方便的层次有一个严格的层的集合, 成员与层紧密的联系.比如,在 Product 层次中, 任何产品名称层的成员在商标层上都有一个父亲，商标层上的成员在产品子目录层也都有一个父亲. 这种结构对于现实世界中的数据有时候太严格了.
一个父子层次只有一层 (不计算 'all' 层), 但是任何成员可以在同一层上有父亲成员. 一个典型的例子是Employees 层次:
<Dimension name="Employees" foreignKey="employee_id">
<Hierarchy hasAll="true" allMemberName="All Employees" primaryKey="employee_id">
<Table name="employee"/>
<Level name="Employee Id" uniqueMembers="true" type="Numeric"
column="employee_id" nameColumn="full_name"
parentColumn="supervisor_id" nullParentValue="0">
<Property name="Marital Status" column="marital_status"/>
<Property name="Position Title" column="position_title"/>
<Property name="Gender" column="gender"/>
<Property name="Salary" column="salary"/>
<Property name="Education Level" column="education_level"/>
<Property name="Management Role" column="management_role"/>
</Level>
</Hierarchy>
</Dimension>
这里parentColumn 和nullParentValue是重要的属性:
属性parentColumn 是一个成员连接到它父亲成员的列名。在这种情况下, 它是指向雇员经理的外键。元素<Level>的子元素 <ParentExpression> 是与属性 parentColumn 有相同作用的，但是元素允许定义任意的SQL表达式, 就像元素 <Expression>. 属性 parentColumn (或者元素<ParentExpression>) 是维一向Mondrian指出层次有父子结构的。
属性 nullParentValue 是指明成员没有父成员的值。缺省情况下 nullParentValue="null", 但是因为许多数据库不支持null, 建模时用其他值来代替空值，0和-1.

物理结构
member reade
member reader 是访问成员的方法. 层次通常以维表为基础建立的，因此要用sql来构造.但是甚至你的数据没有存在于 RDBMS, 你可以通过一个 Java 类来访问层次。（自定义 member reader）
Here are a couple of examples:
DateSource (to be written)生成一个时间层次. 按常规,数据仓库工具生成一个表，每天包含一行。但是问题是这个表需要装载，并且随着时间的变化能够添加更多的行。 DateSource 在内存中按照要求生成日期成员.
FileSystemSource (to be written) 按照目录和文件的层次描述文件系统。 Like the time hierarchy created by DateSource, this is a virtual hierarchy: the member for a particular file is only created when, and if, that file's parent directory is expanded.
ExpressionMemberReader (to be written) 创建了一个基于表达式的层次。
自定义member reader 必须实现接口 mondrian.rolap.MemberSource. 如果你需要实现一个更大的成员操作集合, 需要实现接口 interface mondrian.rolap.MemberReader; 否则, Mondrian在 mondrian.rolap.CacheMemberReader中封装你的 reader类.你的 member reader 必须有一个公共的构造函数，这个构造函数拥有参数(Hierarchy,Properties),抛出未检查的错误.
Member readers 用元素<Hierarchy> 的属性memberReaderClass来声明; 任何 <Parameter> 子元素通过属性构造函数来传递.
这是一个例子:
<Dimension name="Has bought dairy">
<Hierarchy hasAll="true" memberReaderClass="mondrian.rolap.HasBoughtDairySource">
<Level name="Has bought dairy" uniqueMembers="true"/>
<Parameter name="expression" value="not used"/>
</Hierarchy>
</Dimension>
Cell readers
<Measure name="name" cellReaderClass="com.foo.MyCellReader">
类 "com.foo.MyCellReader" 实现了接口interface mondrian.olap.CellReader.

存取控制
可以定义存取控制的属性（角色）, 作为模式的一部分, 并且可以在建立连接的时候设置角色。
定义角色
角色可以通过元素<Role>来设置 , 它是元素<Schema> 的直接的子元素.
下面是一个关于角色的例子:
<Role name="California manager">
<SchemaGrant access="none">
<CubeGrant cube="Sales" access="all">
<HierarchyGrant hierarchy="[Store]" access="custom" topLevel="[Store].[Store Country]">
<MemberGrant member="[Store].[USA].[CA]" access="all"/>
<MemberGrant member="[Store].[USA].[CA].[Los Angeles]" access="none"/>
</HierarchyGrant>
<HierarchyGrant hierarchy="[Customers]" access="custom" topLevel="[Customers].[State Province]" bottomLevel="[Customers].[City]">
<MemberGrant member="[Customers].[USA].[CA]" access="all"/>
<MemberGrant member="[Customers].[USA].[CA].[Los Angeles]" access="none"/>
</HierarchyGrant>
<HierarchyGrant hierarchy="[Gender]" access="none"/>
</CubeGrant>
</SchemaGrant>
</Role>
元素 <SchemaGrant> 定义了模式中缺省的对象方问权限. 访问属性可以是 "all" 或者 "none"; 这个属性可以被具体的权限对象继承. 在这个例子中, 因为 access="none", 用户只能浏览"Sales" 立方体, 这里明确的赋予了这个权限.
元素 <CubeGrant> 定义了立方体的访问权限. 就像 <SchemaGrant>, 属性access 可以是"all" 或者 "none", 并且能够被cube中具体的子对象继承.
元素 <HierarchyGrant>定义了层次的访问权限. 属性access 可以是"all", 意思是所有的members都是可见的; "none",意思是 hierarchy的存在对用户是隐藏的; "custom"，你可以利用属性 topLevel 定义可见的最高层 (阻止用户进行上卷操作，比如浏览税收上卷到 Store Country 层); 或者用属性 bottomLevel 定义可见的最底层 (这里阻止用户查看顾客个人的细节数据);或者控制用户查看哪一个成员集合,通过嵌套定义元素 <MemberGrant>.
你也可以只定义元素 <MemberGrant> ，如果模式的<HierarchyGrant> 有属性access="custom". Member grants 赋予 (或者取消) 访问给定的成员, 以及它的所有子成员.

posted @ 2008-03-26 22:30 edsonjava 阅读(928) | 评论 (0) | 编辑收藏

关于JPivot/mondriant探讨

改了不少JPivot/mondrian代码，还修正了jpivot一个bug。

对JPivot的jfreechart和drillthrough显示做了增强，终于可以拿出去给人用了。

先说说性能问题： 先是找了一台闲置的IBM X445 PC Server，4×2GHZ CPU，8G内存，2×146G硬盘，操作系统 windows 2000 ，开启AWE 3G参数。然后装Oracle 10g，数据仓库模式，使用了4G AWE内存共约4.5GB内存。再建成一张1600万用户数据宽表，宽表一律使用bitmap索引，还有其他20个左右维表。然后就简单了，写mondrian Cube，配JPivot。最后搞下来的结果是：基本上mondrian 每次做group by 操作最长不超过30秒，一般在20秒左右。用户基本可以接受。问了使用NCR的朋友，说NCR使用自己的数据库，也基本是这样的一个性能。 PS：偷偷问一声，在这基础上，性能还能改进否?

再说说方向问题： 我们现在使用2个OLAP，一个是jpivot + mondrian ，属于ROLAP；另一个是BO intelligence + essbase，属于MOLAP。目前的感觉是，由于DB性能强悍，导致ROLAP和MOLAP在性能上相差不大。同时ROLAP可以直接和报表系统共用同一张表。而MOLAP则需要使用工具来打CUBE做数据转换，这样在开发和维护工作量上，MOLAP比ROLAP大。另外往往业务部门分析到最后，就是要看明细数据了，这个时候MOLAP的前端工具往往不能做好支持。而jpivot则无此问题。综上所述，我目前好像还没看到必须用MOLAP的理由，听说华为原来用M$ 的OLAP，后来好像支持不住了，就直接用回了BO 报表，呵呵。

JPivot的问题： 操作太复杂，必须对OLAP的概念有清晰的了解，普通用户无法使用。与mondrian 集成不够紧密。mondrian不提供数据钻取功能，该功能是jpivot自己做的，所以会导致数据类型格式丢失。钻取详细数据量无限制，导致内存溢出。界面比较难看，操作方式非主流使用jpivot自己的mvc框架，不易其他框架集成总体来说，jpivot目前已经不是一个玩具了，完全可以用于企业级的操作，而且定位在高端业务分析人员。

拿出来开源比较困难，一方面jpivot在不停升级，另一方面我在修改的时候不顾一切，在jpivot中乱引用了mondrian代码，还把mondrian部分无用代码全删了。这样，我就在这个帖里把能共享部分都在这里帖出来。首先是我优化后的界面。 1.图标用了pentaho里面的图标。 2.jpivot里面其实支持3D饼图，只是选项未开，我先将jfreechart升级成1.0.2，又对饼图、线图等做了美观。 3.drillthrough是jpivot相对其他olap产品的杀手级功能，但是有不少细节未完善。我基本都一一补上。在界面上可以看出，我添加了一个CSV导出功能(改了WCF库)，同时限制最大导出20万行记录(改了jpivot)。界面上显示的“访问次数”是measure的名字，实际上应该显示“访问时间”，该问题暂时无解。另外修正了一下numberformat、dateformat不正确的一些问题。 4.excel导出时，格式很难看，但是由于excel本身只支持256色，无法显示web上的底色，所以我修改了只显示蓝色的border，底色一律为白。附件中rar里面是web的CSS文件、Excel的生成文件和jpivot的图表生成部分代码，感兴趣的朋友各取所需吧

另外还把jpviot完全整合到我自己的系统中去了，呵呵。可以在系统web界面上编写Cube和MDX定义，Cube和MDX为一对多关系。Cube通过xsd来做校验。开发Cube和MDX的时候可以随时做预览。然后再把一个MDX在界面发布成一个单独的OLAP分析。下一步的目标是将数据权限与jpivot做整合，由于Cube的xml是由系统自动生成的，所以mondrian的role配置也可以由系统根据配置自动生成。这部分代码涉及我的系统和框架比较深，所以不帖代码了哈，大家自己搞搞2天也就出来了

还做了个及其变态的功能，就是把界面上所有显示的jpivot cell，一个个的去取出钻取数据的measure，然后生成csv文件，打成zip包给用户下载或发到其他接口。当时我化了整整一个礼拜钻研mondrian代码，希望可以不用那么傻傻个一个个去钻，结果失败...

我在用Jpivot的时候,发现用mondrian是影响取数性能其中的一个瓶颈........ 经研究.....我们自己修改了jpivot和wcf的一些代码来适应我们自己的项目.........以下是我做的一些修改.....想听听大家的意见 1.脱离mondrian.直接写dll的方式取数,然后生成XML数据 .我发现脱离mondrian自己写了一个DLL去调用MSSQL 2000 的OLAP,数度很快........... 2 .修改界面的显示方式 上面也说道.Jpivot的界面一个不好看,二是用起来很不方便.比如取维度等的时候....一层一层的进去实在很麻烦.... a.修改取维度的方式我们参照ms的做法做成一个了一个树的取数,研究jpivot里面的代码.如果直接用jpivot的代码取数据十分慢.这样我自己通过AJAX和Jpivot结合,动态生成树的结构,然后在树上取维度的时候,直接通过鼠标托到选择维度textbox上.........依照条件生成相应MDX....显示数据..... b.修改数据显示的样式.和取维度,生成MDX分开了. 显示数据我用了另外一种方式显示.就是用Frame分为上下两层.....上下两层可以通过按钮扩大整个页面........ 3. 集成在自己的框架中 集成在自己的框架中,我个人觉得是比较麻烦的一件事情.一点小事没有搞好就很麻烦...因为我们是用JSF开发的.所以依照Jpivot....自己写了一些组件来辅助开发,我自己开发主要改成比较像ms 2000 的olap分析方法... 还未完成的需求 JFreeChar的功能还需要加强. 个人感觉:jpivot是很不错.可是不能一拿来就用..我发现好多人用jpivot都要修改好多东西....但是修改起来又比较麻烦....java,j2ee,xml ,xslt,javascript,taglib.....好多东西都要懂.....

posted @ 2008-03-26 22:28 edsonjava 阅读(2246) | 评论 (1) | 编辑收藏

2008年2月23日

Display tag library 1.1.1

This documentation is related to the displaytag 1.1.x releases.

The latest available release is 1.1.1

Displaytag 1.1 offers several enhancements over 1.0: the most notable news are support for partial lists and enhanced decorator APIs, but there is also a lot more. Be sure to read the migration guide for upgrading an existing application from displaytag 1.0. A full changelog is also available.

Overview

The display tag library is an open source suite of custom tags that provide high-level web presentation patterns which will work in an MVC model. The library provides a significant amount of functionality while still being easy to use.

What can I do with it?

Actually the display tag library can just... display tables! Give it a list of objects and it will handle column display, sorting, paging, cropping, grouping, exporting, smart linking and decoration of a table in a customizable XHTML style.

The tables in the sample images below were generated from lists using the <display:table> tag:

sample tables produced with the display:table tag

posted @ 2008-02-23 23:47 edsonjava 阅读(528) | 评论 (0) | 编辑收藏

现有Java开源BI前端框架

近在论证java领域的开源BI前端框架,把随手记得东西和大家分享下.
因为只看了几天，有没时间整理所以看起来比较乱，也不是很深入。

目前在java领域较常见的BI前端框架（商业智能项目）主要有以下几个Pentaho，spagoBi, OpenI, JASPER intelligence等开源框架。

他们都有自己的强项和不足，下面简要介绍下：

轻量级的:

OpenI使用Mondrian和Jpivot框架，报表引擎是jasper report，数据挖掘接口是R-Project，

相对来说开发和学习比较简单，而且OpenI支持使用MS的数据仓库(xmla)，但是其国际化比较失败(中文乱码)，要深入改造。

JASPER intelligence也是个轻型项目，对jasper report的支持最好，所以报表部分比较好。

重量级的:

Pentaho，spagoBi是两个比较大的框架了，集成了相当多的开源项目，JfreeReport、Mondrian、Kettle、Weka基本都使用了。特别适合大型复杂项目的开发。

Pentaho在中国使用的比较多，文档什么的也多一点。尤其值得一提的是网络上对他的中文支持做的相当好，很多志愿者翻译了它的文档。这给我们开发带来很大便利。

Pentaho的模块工作流引擎、中心资源库、审计组件、报表设计工具、ETL工具、OLAP Server、多维展示、数据挖掘组件各种组建都有。

而且Pentaho得到了很大的投资，开发后劲很大，而且会有付费的官方发售版本。

http://blog.csdn.net/dust_bug/archive/2006/09/18/1240753.aspx

这个是《Pentaho源代码阅读报告》，介绍Pentaho构架相当的全面。

Pentaho的中文论坛在http://www.bipub.org/

Pentaho相对spagoBi来说功能较强，尤其是工作流一块做的相当不错。

官方站的demos在http://www.pentaho.com/products/demos/

spagoBi功能也很强，尤其是最近发布的1。9版本，在http://spagobi.eng.it:8080/sbiportal/faces/public/exo（或http://spagobi.eng.it:8080/sbiportal）

的demos里展现了spagoBi很多功能。

后记
这几款BI框架因为都是开源的前端框架，所以核心部分使用的还是一些开源项目，

Mondrian，Jpivot，JfreeReport，所以在使用的时候搭建合适的框架会占用项目很大一部分时间，但是一旦框架搭建好了，基本就可以象流水线一样出报表了。

但是期望在原始功能上添加性能功能是比较麻烦的，为了一个新加的功能可能需要相当长的时间来实现。

另外这些开源框架的权限管理都不怎么强，可能需要改造。

另外，全球话的问题也是问题。象OpenI完全不支持中文，必须改造。

posted @ 2008-02-23 23:38 edsonjava 阅读(963) | 评论 (0) | 编辑收藏

几款BI相关的开源工具

我们都知道“瞎子摸象”的故事。不同的瞎子对大象的认识不同，因为他们只认识了自己摸到的地方。而企业如果要避免重犯这样的错误，那就离不开商务智能（BI）。专家认为，BI对于企业的重要性就像聪明才智对于个人的重要性。欧美企业的经验也证明，企业避免无知和一知半解危险的有效手段就是商务智能。商务智能旨在充分利用企业在日常经营过程中收集的大量数据和资料，并将它们转化为信息和知识来免除各种无知状态和瞎猜行为。

支持BI的开源工具数量众多，但是大多数的工具都是偏重某方面的。例如，CloverETL偏重ETL，JPivot偏重多维分析展现，Mondrian是OLAP服务器。而Bee、Pentaho和SpagoBI等项目则针对商务智能问题提供了完整的解决方案。

ETL 工具

ETL开源工具主要包括CloverETL和Octupus等。

（1）CloverETL是一个Java的ETL框架，用来转换结构化的数据，支持多种字符集之间的转换（如ASCII、UTF-8和ISO-8859-1等）；支持JDBC，同时支持dBase和FoxPro数据文件；支持基于XML的转换描述。

(2)Octupus是一个基于Java的ETL工具，它也支持JDBC数据源和基于XML的转换定义。Octupus提供通用的方法进行数据转换，用户可以通过实现转换接口或者使用Jscript代码来定义转换流程。

OLAP服务器

(1)Lemur主要面向HOLAP，虽然采用C++编写，但是可以被其他语言的程序所调用。Lemur支持基本的操作，如切片、切块和旋转等基本操作。

(2)Mondrian面向ROLAP包含4层：表示层、计算层、聚集层、存储层。

● 表示层：指最终呈现在用户显示器上的以及与用户之间的交互，有许多方法来展现多维数据，包括数据透视表、饼、柱、线状图。

● 计算层：分析、验证、执行MDX查询。

● 聚集层：一个聚集指内存中一组计算值(cell)，这些值通过维列来限制。计算层发送单元请求，如果请求不在缓存中，或者不能通过旋转聚集导出的话，那么聚集层向存储层发送请求。聚合层是一个数据缓冲层，从数据库来的单元数据，聚合后提供给计算层。聚合层的主要作用是提高系统的性能。

● 存储层：提供聚集单元数据和维表的成员。包括三种需要存储的数据，分别是事实数据、聚集和维。

OLAP客户端

JPivot是JSP风格的标签库，用来支持OLAP表，使用户可以执行典型的OLAP操作，如切片、切块、上钻、下钻等。JPivot使用Mondrian服务器，分析结果可以导出为Excel或PDF文件格式。

数据库管理系统

主要的开源工具包括MonetDB、MySQL、MaxDB和PostgreSQL等。这些数据库都被设计用来支持BI环境。MySQL、MaxDB和PostgreSQL均支持单向的数据复制。BizGres项目的目的在于使PostgreSQL成为数据仓库和 BI的开源标准。BizGres为BI环境构建专用的完整数据库平台。

完整的BI开源解决方案

1.Pentaho 公司的Pentaho BI 平台

它是一个以流程为中心的、面向解决方案的框架，具有商务智能组件。BI 平台是以流程为中心的，其中枢控制器是一个工作流引擎。工作流引擎使用流程定义来定义在 BI 平台上执行的商务智能流程。流程可以很容易被定制，也可以添加新的流程。BI 平台包含组件和报表，用以分析这些流程的性能。BI 平台是面向解决方案的，平台的操作是定义在流程定义和指定每个活动的 action 文档里。这些流程和操作共同定义了一个商务智能问题的解决方案。这个 BI 解决方案可以很容易地集成到平台外部的商业流程。一个解决方案的定义可以包含任意数量的流程和操作。

BI平台包括一个 BI 框架、BI 组件、一个 BI 工作台和桌面收件箱。BI 工作台是一套设计和管理工具，集成到Eclipse环境。这些工具允许商业分析人员或开发人员创建报表、仪表盘、分析模型、商业规则和 BI 流程。Pentaho BI 平台构建于服务器、引擎和组件的基础之上，包括J2EE 服务器、安全与权限控制、portal、工作流、规则引擎、图表、协作、内容管理、数据集成、多维分析和系统建模等功能。这些组件的大部分是基于标准的，可使用其他产品替换之。

2.ObjectWeb

该项目近日发布了SpagoBi 1.8版本。SpagoBi 是一款基于Mondrain+JProvit的BI方案，能够通过OpenLaszlo产生实时报表，为商务智能项目提供了一个完整开源的解决方案，它涵盖了一个BI系统所有方面的功能，包括：数据挖掘、查询、分析、报告、Dashboard仪表板等等。SpagoBI使用核心系统与功能模块集成的架构，这样在确保平台稳定性与协调性的基础上又保证了系统具有很强的扩展能力。用户无需使用SpagoBI的所有模块，而是可以只利用其中的一些模块。

SpagoBI使用了许多已有的开源软件，如Spago和Spagosi等。因此，SpagoBI集成了 Spago的特征和技术特点，使用它们管理商务智能对象，如报表、OLAP分析、仪表盘、记分卡以及数据挖掘模型等。SpagoBI支持BI系统的监控管理，包括商务智能对象的控制、校验、认证和分配流程。SpagoBI采用Portalet技术将所有的BI对象发布到终端用户，因此BI对象就可以集成到为特定的企业需求而已经选择好的Portal系统中去。

3.Bee项目

该项目是一套支持商务智能项目实施的工具套件，包括ETL工具和OLAP 服务器。Bee的ETL工具使用基于Perl的BEI，通过界面描述流程，以XML形式进行存储。用户必须对转换过程进行编码。Bee的ROLAP 服务器保证多通SQL 生成和强有力的高速缓存管理(使用MySQL数据库管理系统)。ROLAP服务器通过SOAP应用接口提供丰富的客户应用。Web Portal作为主要的用户接口，通过Web浏览器进行报表设计、展示和管理控制，分析结果可以以Excel、PDF、PNG、PowerPoint、 text和XML等多种形式导出。

Bee项目的特点在于：

● 简单快捷的数据访问；

● 支持预先定义报表和实时查询；

● 通过拖拽方式轻松实现报表定制；

● 完整报表的轻松控制；

● 以表和图进行高质量的数据展示。

posted @ 2008-02-23 23:29 edsonjava 阅读(535) | 评论 (0) | 编辑收藏

OLAP:Mondrian&JPviot

java /zongfeng 发表于2005-01-21, 23:05

mondrian是一个olap工具，jpviot是一个显示它处理结果的taglib,使用这2个工具可以做复杂的统计汇总并显示

OLAP:Mondrian&JPviot

olap:online analytical processing(联机分析处理),实时的分析大量数据,其操作通常是只读的.online意味着即使是大量的数据,系统对查询的响应也要足够快.

olap使用一种技术叫做multimensional analysis(多维分析),关系数据库将数据存成行和列的形式,多维数据表包含轴和单元.

mondrian包含4层:表示层,计算层,聚集层,存储层.

表示层:指最终呈现在用户显示器上的,以及与用户之间的交互,有许多方法来展现多维数据,包括数据透视表,饼,柱,线状图.

计算层:分析,验证,执行MDX查询.

聚集层:一个聚集指内存中一组计算值(cell),这些值通过维列来限制.计算层发送单元请求,如果请求不在缓存中,或者不能通过旋转聚集导出的话,聚集层向存储层发送请求.

聚合层是一个数据缓冲层，从数据库来的单元数据，聚合后提供给计算层。聚合层的主要作用是提高系统的性能。

存储层:提供聚集单元数据和维表的成员,这些层可以不在同一机子上,但是计算和聚集层必须在同一台机子上.

三种需要存储的数据:1:事实数据2:聚集3:维

配置文件中的特定含义:
1:cube(立方体):是维和量的集合

2:measure(量):一个具体的测量量

3:dimension(维):一个属性或者一系列属性,通过维可以将量分类

下面是我关于jpviot的修改:jpviot是显示mondrian的一个taglib

问题1:让行和列的标题显示为中文,此问题非常简单,只需要在你的schema中设置一下编码即可,例如在FoodMart中设置如下

<?xml version="1.0" encoding="gb2312"?>

然后可以这样描述Measure:

所有带name属性的都可以替换成中文,jpviot会自动显示这些中文.

问题2:关于去掉Measure标题的问题:

默认生成的报表中会有这么一行
<tr>
<th rowspan="1" colspan="2" class="corner-heading" nowrap="nowrap"> </th><th rowspan="1" colspan="3" class="heading-heading" nowrap="nowrap"><img height="9" width="9" border="0" src="/jpivot/jpivot/table/drill-position-other.gif">Measures</th>
</tr>

这一行有个默认的标题是Measure,如果你不想删除这一行,而仅仅想修改这个标题的话,可以修改
WEB-INFclassescomtonbellerjpivotmondrianresources.properties.但是注意这个文件中内容写成英文没问题,如写成中文的话应该写成unicode,例如023这样的形式.

如果你要去掉这一行的话,修改配置文件和xsl恐怕做不到,我分析了其代码,最终在代码层次上做了修改:
修改的代码为com.tonbeller.jpivot.table.ColumnAxisBuilderImpl:

将其构造函数中的setHierarchyHeader的参数修改为setHierarchyHeader(NO_HEADER);这个函数支持3个参数,我们修改后就不会显示那个标题行了.

问题3:生成图表后自动生成chart表的问题:

我测试生成图表中的中文问题都解决了,但是每次生成chart图时会报UTF编码错误,从错误判断应该是某个文件的编码错误,起初根据错误判断是filter的问题,可是filter那点代码中根本不涉及编码的问题.我将很多配置文件的编码都改了也不行.因为那个英文例子没问题,我查看了JFreechart的一个servlet(org.jfree.chart.servlet.DisplayChart),因为jpviot就是调用这个servlet实现绘图的,分析这个servlet我知道它会在一个临时目录生成png文件,然后交给servlet写到浏览器的响应中去,我找到那个临时目录(tomcattemp),发现里面已经生成了正确的中文图形.从而判断图形生成正确,但是写到浏览器中时出了问题.最后我查看能生成英文图表的那个例子,发觉不仅仅在html中生成图形,而且生成map.而这个map的生成全是在程序中做的,程序生成一个xml文件,通过chart.xsl解析生成map的最终html代码.但是在程序中生成时并没有加入编码设置,因此问题出在生成map这儿.

最终修改代码如下:

com.tonbeller.jpivot.chart.ChartComponent:

在render函数中修改如下:

String desc="<?xml version="1.0" encoding="gb2312"?>";
String xchart =desc+"n"+ "<xchart>" + writeImageMap(filename, info, false) + "</xchart>";
这样就为xchart设置了编码.

问题4:修改jfreechart中的默认字体:

com.tonbeller.jpivot.chart.ChartComponent中定义了几种字体,但是这几种字体都是英文字体,我将其修改为宋体:
把所有的字体定义都改为"SimSun"
注意到这儿并没有玩,如果你仅仅修改程序,仍旧会出现问题,报错说没有适合"SimSun"的item
同时要修改一个配置文件:WEB-INFjpivotchartchartpropertiesform.xml
在这个配置文件中将SimSun加入其中,形式如下:

以上为我最近的一点心得,我会完善这篇文档,将包含mondrian中schema的书写方法和MDX查询语言,欢迎大家交流
link1:微软的MDX中文文档

posted @ 2008-02-23 23:20 edsonjava 阅读(886) | 评论 (0) | 编辑收藏

xml+xsl应用,包含中文字符的URL编码问题

在xml应用中，经常将一些URL信息作为xml数据存储，其中URL参数有可能包含有中文字符。
当使用dom对xml数据进行解析时，可以对中文字符进行编码。
但如果只使用xslt来显示xml数据时(data.xml+data.xsl)，发现此时的URL会出现编码错误.
即使指定编码类型(encoding="gb2312"),依然会出现同样的问题.
测试发现：是IE的缓存机制问题，IE仍会把新的页面(所链接的URL)的MIME内容类型默认为text/xml

解决方法：
1.指定输出文档类型为xml文档 (example:data.xsl)
<xsl:output method="xml" encoding="gb2312" media-type="text/xml" />
2.在新的窗口打开，给联接增加属性,指明目标窗口为其他窗口 (example:data2.xsl)
<xsl:attribute name="target">_blank</xsl:attribute>

examples:

/*** data.xml ***/

<?xml version="1.0" encoding="gb2312"?>
<?xml-stylesheet type="text/xsl" href="data.xsl"?>
<root>
<search>
  <url>http://www.google.com/search?q=</url>
  <word>xml数据</word>
</search>
<search>
  <url>http://www1.baidu.com/baidu?word=</url>
  <word>xml数据</word>
</search>
<search>
  <url>http://www.google.com/search?q=</url>
  <word>极限编程(xp)</word>
</search>
<search>
  <url>http://www1.baidu.com/baidu?word=</url>
  <word>极限编程(xp)</word>
</search>
</root>

/*** data.xsl ***/

<?xml version="1.0" encoding="gb2312"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" encoding="gb2312" media-type="text/xml" />

<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>

<xsl:template match="search">
<xsl:element name="a">
<xsl:attribute name="href"><xsl:value-of select="url" /><xsl:value-of select="word" /></xsl:attribute>
<xsl:value-of select="word" />
</xsl:element>
<br />
</xsl:template>

</xsl:stylesheet>

/*** data2.xsl ***/

<?xml version="1.0" encoding="gb2312"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>

<xsl:template match="search">
<xsl:element name="a">
  <xsl:attribute name="href"><xsl:value-of select="url" /><xsl:value-of select="word" /></xsl:attribute>
  
  <xsl:attribute name="target">_blank</xsl:attribute>
  <xsl:value-of select="word" />
</xsl:element>
<br />
</xsl:template>

</xsl:stylesheet>

posted @ 2008-02-23 23:08 edsonjava 阅读(533) | 评论 (0) | 编辑收藏

2007年7月26日

Java Servlet API说明文档（2.1a版）

软件包：javax.servlet.http
      所包含的接口：HttpServletRequest；HttpServletResponse；HttpSession；HttpSessionBindingListener；HttpSessionContext。
      所包含的类：Cookie；HttpServlet；HttpSessionBindingEvent；HttpUtils。

      一、HttpServletRequest接口
      定义\
      public interface HttpServletRequest extends ServletRequest;
      用来处理一个对Servlet的HTTP格式的请求信息。
      方法
      1、getAuthType
      public String getAuthType();
      返回这个请求的身份验证模式。
      2、getCookies
      public Cookie[] getCookies();
      返回一个数组，该数组包含这个请求中当前的所有cookie。如果这个请求中没有cookie，返回一个空数组。
      3、getDateHeader
      public long getDateHeader(String name);
      返回指定的请求头域的值，这个值被转换成一个反映自1970-1-1日（GMT）以来的精确到毫秒的长整数。
      如果头域不能转换，抛出一个IllegalArgumentException。如果这个请求头域不存在，这个方法返回-1。
      4、getHeader
      public String getHeader(String name);
      返回一个请求头域的值。（译者注：与上一个方法不同的是，该方法返回一个字符串）
      如果这个请求头域不存在，这个方法返回-1。
      5、getHeaderNames
      public Enumeration getHeaderNames();
      该方法返回一个String对象的列表，该列表反映请求的所有头域名。
      有的引擎可能不允许通过这种方法访问头域，在这种情况下，这个方法返回一个空的列表。
      6、getIntHeader
      public int getIntHeader(String name);
      返回指定的请求头域的值，这个值被转换成一个整数。
      如果头域不能转换，抛出一个IllegalArgumentException。如果这个请求头域不存在，这个方法返回-1。
      7、getMethod
      public String getMethod();
      返回这个请求使用的HTTP方法（例如：GET、POST、PUT）
      8、getPathInfo
      public String getPathInfo();
      这个方法返回在这个请求的URL的Servlet路径之后的请求URL的额外的路径信息。如果这个请求URL包括一个查询字符串，在返回值内将不包括这个查询字符串。这个路径在返回之前必须经过URL解码。如果在这个请求的URL的Servlet路径之后没有路径信息。这个方法返回空值。
      9、getPathTranslated
      public String getPathTranslated();
      这个方法获得这个请求的URL的Servlet路径之后的额外的路径信息，并将它转换成一个真实的路径。在进行转换前，这个请求的URL必须经过URL解码。如果在这个URL的Servlet路径之后没有附加路径信息。这个方法返回空值。
      10、getQueryString
      public String getQueryString();
      返回这个请求URL所包含的查询字符串。一个查询字串符在一个URL中由一个“？”引出。如果没有查询字符串，这个方法返回空值。
      11、getRemoteUser
      public String getRemoteUser
      返回作了请求的用户名，这个信息用来作HTTP用户论证。
      如果在请求中没有用户名信息，这个方法返回空值。
      12、getRequestedSessionId
      public String getRequestedSessionId();
      返回这个请求相应的session id。如果由于某种原因客户端提供的session id是无效的，这个session id将与在当前session中的session id不同，与此同时，将建立一个新的session。
      如果这个请求没与一个session关联，这个方法返回空值。
      13、getRequestURI
      public String getRequestURI();
      从HTTP请求的第一行返回请求的URL中定义被请求的资源的部分。如果有一个查询字符串存在，这个查询字符串将不包括在返回值当中。例如，一个请求通过/catalog/books?id=1这样的URL路径访问，这个方法将返回/catalog/books。这个方法的返回值包括了Servlet路径和路径信息。
      如果这个URL路径中的的一部分经过了URL编码，这个方法的返回值在返回之前必须经过解码。
      14、getServletPath
      public String getServletPath();
      这个方法返回请求URL反映调用Servlet的部分。例如，一个Servlet被映射到/catalog/summer这个URL路径，而一个请求使用了/catalog/summer/casual这样的路径。所谓的反映调用Servlet的部分就是指/catalog/summer。
      如果这个Servlet不是通过路径匹配来调用。这个方法将返回一个空值。
      15、getSession
      public HttpSession getSession();
      public HttpSession getSession(boolean create);
      返回与这个请求关联的当前的有效的session。如果调用这个方法时没带参数，那么在没有session与这个请求关联的情况下，将会新建一个session。如果调用这个方法时带入了一个布尔型的参数，只有当这个参数为真时，session才会被建立。
      为了确保session能够被完全维持。Servlet开发者必须在响应被提交之前调用该方法。
      如果带入的参数为假，而且没有session与这个请求关联。这个方法会返回空值。
      16、isRequestedSessionIdValid
      public boolean isRequestedSessionIdValid();
      这个方法检查与此请求关联的session当前是不是有效。如果当前请求中使用的session无效，它将不能通过getSession方法返回。
      17、isRequestedSessionIdFromCookie
      public boolean isRequestedSessionIdFromCookie();
      如果这个请求的session id是通过客户端的一个cookie提供的，该方法返回真，否则返回假。
      18、isRequestedSessionIdFromURL
      public boolean isRequestedSessionIdFromURL();
      如果这个请求的session id是通过客户端的URL的一部分提供的，该方法返回真，否则返回假。请注意此方法与isRequestedSessionIdFromUrl在URL的拼写上不同。
      以下方法将被取消\

      19、isRequestedSessionIdFromUrl
      public boolean isRequestedSessionIdFromUrl();
      该方法被isRequestedSessionIdFromURL代替。

      二、HttpServletResponse接口
      定义\

      public interface HttpServletResponse extends ServletResponse
      描述一个返回到客户端的HTTP回应。这个接口允许Servlet程序员利用HTTP协议规定的头信息。
      成员变量
      public static final int SC_CONTINUE = 100;
      public static final int SC_SWITCHING_PROTOCOLS = 101;
      public static final int SC_OK = 200;
      public static final int SC_CREATED = 201;
      public static final int SC_ACCEPTED = 202;
      public static final int SC_NON_AUTHORITATIVE_INFORMATION = 203;
      public static final int SC_NO_CONTENT = 204;
      public static final int SC_RESET_CONTENT = 205;
      public static final int SC_PARTIAL_CONTENT = 206;
      public static final int SC_MULTIPLE_CHOICES = 300;
      public static final int SC_MOVED_PERMANENTLY = 301;
      public static final int SC_MOVED_TEMPORARILY = 302;
      public static final int SC_SEE_OTHER = 303;
      public static final int SC_NOT_MODIFIED = 304;
      public static final int SC_USE_PROXY = 305;
      public static final int SC_BAD_REQUEST = 400;
      public static final int SC_UNAUTHORIZED = 401;
      public static final int SC_PAYMENT_REQUIRED = 402;
      public static final int SC_FORBIDDEN = 403;
      public static final int SC_NOT_FOUND = 404;
      public static final int SC_METHOD_NOT_ALLOWED = 405;
      public static final int SC_NOT_ACCEPTABLE = 406;
      public static final int SC_PROXY_AUTHENTICATION_REQUIRED = 407;
      public static final int SC_REQUEST_TIMEOUT = 408;
      public static final int SC_CONFLICT = 409;
      public static final int SC_GONE = 410;
      public static final int SC_LENGTH_REQUIRED = 411;
      public static final int SC_PRECONDITION_FAILED = 412;
      public static final int SC_REQUEST_ENTITY_TOO_LARGE = 413;
      public static final int SC_REQUEST_URI_TOO_LONG = 414;
      public static final int SC_UNSUPPORTED_MEDIA_TYPE = 415;
      public static final int SC_INTERNAL_SERVER_ERROR = 500;
      public static final int SC_NOT_IMPLEMENTED = 501;
      public static final int SC_BAD_GATEWAY = 502;
      public static final int SC_SERVICE_UNAVAILABLE = 503;
      public static final int SC_GATEWAY_TIMEOUT = 504;
      public static final int SC_HTTP_VERSION_NOT_SUPPORTED = 505;
      以上HTTP产状态码是由HTTP/1.1定义的。
      方法
      1、addCookie
      public void addCookie(Cookie cookie);
      在响应中增加一个指定的cookie。可多次调用该方法以定义多个cookie。为了设置适当的头域，该方法应该在响应被提交之前调用。
      2、containsHeader
      public boolean containsHeader(String name);
      检查是否设置了指定的响应头。
      3、encodeRedirectURL
      public String encodeRedirectURL(String url);
      对sendRedirect方法使用的指定URL进行编码。如果不需要编码，就直接返回这个URL。之所以提供这个附加的编码方法，是因为在redirect的情况下，决定是否对URL进行编码的规则和一般情况有所不同。所给的URL必须是一个绝对URL。相对URL不能被接收，会抛出一个IllegalArgumentException。
      所有提供给sendRedirect方法的URL都应通过这个方法运行，这样才能确保会话跟踪能够在所有浏览器中正常运行。
      4、encodeURL
      public String encodeURL(String url);
      对包含session ID的URL进行编码。如果不需要编码，就直接返回这个URL。Servlet引擎必须提供URL编码方法，因为在有些情况下，我们将不得不重写URL，例如，在响应对应的请求中包含一个有效的session，但是这个session不能被非URL的（例如cookie）的手段来维持。
      所有提供给Servlet的URL都应通过这个方法运行，这样才能确保会话跟踪能够在所有浏览器中正常运行。
      5、sendError
      public void sendError(int statusCode) throws IOException;
      public void sendError(int statusCode, String message) throws
         IOException;
      用给定的状态码发给客户端一个错误响应。如果提供了一个message参数，这将作为响应体的一部分被发出，否则，服务器会返回错误代码所对应的标准信息。
      调用这个方法后，响应立即被提交。在调用这个方法后，Servlet不会再有更多的输出。
      6、sendRedirect
      public void sendRedirect(String location) throws IOException;
      使用给定的路径，给客户端发出一个临时转向的响应（SC_MOVED_TEMPORARILY）。给定的路径必须是绝对URL。相对URL将不能被接收，会抛出一个IllegalArgumentException。
      这个方法必须在响应被提交之前调用。调用这个方法后，响应立即被提交。在调用这个方法后，Servlet不会再有更多的输出。
      7、setDateHeader
      public void setDateHeader(String name, long date);
      用一个给定的名称和日期值设置响应头，这里的日期值应该是反映自1970-1-1日（GMT）以来的精确到毫秒的长整数。如果响应头已经被设置，新的值将覆盖当前的值。
      8、setHeader
      public void setHeader(String name, String value);
      用一个给定的名称和域设置响应头。如果响应头已经被设置，新的值将覆盖当前的值。
      9、setIntHeader
      public void setIntHeader(String name, int value);
      用一个给定的名称和整形值设置响应头。如果响应头已经被设置，新的值将覆盖当前的值。
      10、setStatus
      public void setStatus(int statusCode);
      这个方法设置了响应的状态码，如果状态码已经被设置，新的值将覆盖当前的值。
      以下的几个方法将被取消\
      11、encodeRedirectUrl
      public String encodeRedirectUrl(String url);
      该方法被encodeRedirectURL取代。
      12、encodeUrl
      public String encodeUrl(String url);
      该方法被encodeURL取代。
      13、setStatus
      public void setStatus(int statusCode, String message);
      这个方法设置了响应的状态码，如果状态码已经被设置，新的值将覆盖当前的值。如果提供了一个message，它也将会被作为响应体的一部分被发送。

      三、HttpSession接口
      定义\
      public interface HttpSession
      这个接口被Servlet引擎用来实现在HTTP客户端和HTTP会话两者的关联。这种关联可能在多外连接和请求中持续一段给定的时间。session用来在无状态的HTTP协议下越过多个请求页面来维持状态和识别用户。
      一个session可以通过cookie或重写URL来维持。
      方法
      1、getCreationTime
      public long getCreationTime();
      返回建立session的时间，这个时间表示为自1970-1-1日（GMT）以来的毫秒数。
      2、getId
      public String getId();
      返回分配给这个session的标识符。一个HTTP session的标识符是一个由服务器来建立和维持的唯一的字符串。
      3、getLastAccessedTime
      public long getLastAccessedTime();
      返回客户端最后一次发出与这个session有关的请求的时间，如果这个session是新建立的，返回-1。这个时间表示为自1970-1-1日（GMT）以来的毫秒数。
      4、getMaxInactiveInterval
      public int getMaxInactiveInterval();
      返加一个秒数，这个秒数表示客户端在不发出请求时，session被Servlet引擎维持的最长时间。在这个时间之后，Servlet引擎可能被Servlet引擎终止。如果这个session不会被终止，这个方法返回-1。
      当session无效后再调用这个方法会抛出一个IllegalStateException。
      5、getValue
      public Object getValue(String name);
      返回一个以给定的名字绑定到session上的对象。如果不存在这样的绑定，返回空值。
      当session无效后再调用这个方法会抛出一个IllegalStateException。
      6、getValueNames
      public String[] getValueNames();
      以一个数组返回绑定到session上的所有数据的名称。
      当session无效后再调用这个方法会抛出一个IllegalStateException。
      7、invalidate
      public void invalidate();
      这个方法会终止这个session。所有绑定在这个session上的数据都会被清除。并通过HttpSessionBindingListener接口的valueUnbound方法发出通告。
      8、isNew
      public boolean isNew();
      返回一个布尔值以判断这个session是不是新的。如果一个session已经被服务器建立但是还没有收到相应的客户端的请求，这个session将被认为是新的。这意味着，这个客户端还没有加入会话或没有被会话公认。在他发出下一个请求时还不能返回适当的session认证信息。
      当session无效后再调用这个方法会抛出一个IllegalStateException。
      9、putValue
      public void putValue(String name, Object value);
      以给定的名字，绑定给定的对象到session中。已存在的同名的绑定会被重置。这时会调用HttpSessionBindingListener接口的valueBound方法。
      当session无效后再调用这个方法会抛出一个IllegalStateException。
      10、removeValue
      public void removeValue(String name);
      取消给定名字的对象在session上的绑定。如果未找到给定名字的绑定的对象，这个方法什么出不做。这时会调用HttpSessionBindingListener接口的valueUnbound方法。
      当session无效后再调用这个方法会抛出一个IllegalStateException。
      11、setMaxInactiveInterval
      public int setMaxInactiveInterval(int interval);
      设置一个秒数，这个秒数表示客户端在不发出请求时，session被Servlet引擎维持的最长时间。
      以下这个方法将被取消\
      12、getSessionContext
      public HttpSessionContext getSessionContext();
      返回session在其中得以保持的环境变量。这个方法和其他所有HttpSessionContext的方法一样被取消了。

      四、HttpSessionBindingListener接口
      定义\
      public interface HttpSessionBindingListener
      这个对象被加入到HTTP的session中，执行这个接口会通告有没有什么对象被绑定到这个HTTP session中或被从这个HTTP session中取消绑定。
      方法
      1、valueBound
      public void valueBound(HttpSessionBindingEvent event);
      当一个对象被绑定到session中，调用此方法。HttpSession.putValue方法被调用时，Servlet引擎应该调用此方法。
      2、valueUnbound
      public void valueUnbound(HttpSessionBindingEvent event);
      当一个对象被从session中取消绑定，调用此方法。HttpSession.removeValue方法被调用时，Servlet引擎应该调用此方法。

      五、HttpSessionContext接口
      定义\
      此接口将被取消\
      public interface HttpSessionContext
      这个对象是与一组HTTP session关联的单一的实体。
      这个接口由于安全的原因被取消，它出现在目前的版本中仅仅是为了兼容性的原因。这个接口的方法将模拟以前的版本的定义返回相应的值。
      方法
      1、getSession
      public HttpSession getSession(String sessionId);
      当初用来返回与这个session id相关的session。现在返回空值。
      2、getIds
      public Enumeration getIds();
      当初用来返回这个环境下所有session id的列表。现在返回空的列表。

      六、Cookie类\
      定义\
      public class Cookie implements Cloneable
      这个类描述了一个cookie，有关cookie的定义你可以参照Netscape Communications Corporation的说明，也可以参照RFC 2109。
      构造函数
      public Cookie(String name, String value);
      用一个name-value对定义一个cookie。这个name必须能被HTTP/1.1所接受。
      以字符$开头的name被RFC 2109保留。
      给定的name如果不能被HTTP/1.1所接受，该方法抛出一个IllegalArgumentException。
      方法
      1、getComment
      public String getComment();
      返回描述这个cookie目的的说明，如果未定义这个说明，返回空值。
      2、getDomain
      public String getDomain();
      返回这个cookie可以出现的区域，如果未定义区域，返回空值。
      3、getMaxAge
      public int getMaxAge();
      这个方法返回这个cookie指定的最长存活时期。如果未定义这个最长存活时期，该方法返回-1。
      4、getName
      public String getName();
      该方法返回cookie名。
      5、getPath
      public String getPath();
      返回这个cookie有效的所有URL路径的前缀，如果未定义，返回空值。
      6、getSecure
      public boolean getSecure();
      如果这个cookie只通过安全通道传输返回真，否则返回假。
      7、getValue
      public String getValue();
      该方法返回cookie的值。
      8、getVersion
      public int getVersion();
      返回cookie的版本。版本1由RFC 2109解释。版本0由Netscape Communications Corporation的说明解释。新构造的cookie默认使用版本0。
      9、setComment
      public void setComment(String purpose);
      如果一个用户将这个cookie提交给另一个用户，必须通过这个说明描述这个cookie的目的。版本0不支持这个属性。
      10、setDomain
      public void setDomain(String pattern);
      这个方法设置cookie的有效域的属性。这个属性指定了cookie可以出现的区域。一个有效域以一个点开头（.foo.com），这意味着在指定的域名解析系统的区域中（可能是www.foo.com但不是a.b.foo.com）的主机可以看到这个cookie。默认情况是，cookie只能返回保存它的主机。
      11、setMaxAge
      public void setMaxAge(int expiry);
      这个方法设定这个cookie的最长存活时期。在该存活时期之后，cookie会被终目。负数表示这个cookie不会生效，0将从客户端删除这个cookie。
         12、setPath
      public void setPath(String uri);
      这个方法设置cookie的路径属性。客户端只能向以这个给定的路径String开头的路径返回cookie。
      13、setSecure
      public void setSecure(boolean flag);
      指出这个cookie只能通过安全通道（例如HTTPS）发送。只有当产生这个cookie的服务器使用安全协议发送这个cookie值时才能这样设置。
      14、setValue
      public void setValue(String newValue);
      设置这个cookie的值，对于二进制数据采用BASE64编码。
      版本0不能使用空格、{}、()、=、，、“”、/、?、@、：以及；。
      15、setVersion
      public void setVersion(int v);
      设置cookie的版本号

      七、HttpServlet类\
      定义\
      public class HttpServlet extends GenericServlet implements
         Serializable
      这是一个抽象类，用来简化HTTP Servlet写作的过程。它是GenericServlet类的扩充，提供了一个处理HTTP协议的框架。
      在这个类中的service方法支持例如GET、POST这样的标准的HTTP方法。这一支持过程是通过分配他们到适当的方法（例如doGet、doPost）来实现的。
      方法
      1、doDelete
      protected void doDelete(HttpServletRequest request,
            HttpServletResponse response) throws ServletException,
            IOException;
      被这个类的service方法调用，用来处理一个HTTP DELETE操作。这个操作允许客户端请求从服务器上删除URL。这一操作可能有负面影响，对此用户就负起责任。
      这一方法的默认执行结果是返回一个HTTP BAD_REQUEST错误。当你要处理DELETE请求时，你必须重载这一方法。
      2、doGet
      protected void doGet(HttpServletRequest request,
            HttpServletResponse response) throws ServletException,
            IOException;
      被这个类的service方法调用，用来处理一个HTTP GET操作。这个操作允许客户端简单地从一个HTTP服务器“获得”资源。对这个方法的重载将自动地支持HEAD方法。
      GET操作应该是安全而且没有负面影响的。这个操作也应该可以安全地重复。
      这一方法的默认执行结果是返回一个HTTP BAD_REQUEST错误。
      3、doHead
      protected void doHead(HttpServletRequest request,
            HttpServletResponse response) throws ServletException,
            IOException;
      被这个类的service方法调用，用来处理一个HTTP HEAD操作。默认的情况是，这个操作会按照一个无条件的GET方法来执行，该操作不向客户端返回任何数据，而仅仅是返回包含内容长度的头信息。
      与GET操作一样，这个操作应该是安全而且没有负面影响的。这个操作也应该可以安全地重复。
      这个方法的默认执行结果是自动处理HTTP HEAD操作，这个方法不需要被一个子类执行。
      4、doOptions
      protected void doOptions(HttpServletRequest request,
            HttpServletResponse response) throws ServletException,
            IOException;
      被这个类的service方法调用，用来处理一个HTTP OPTION操作。这个操作自动地决定支持哪一种HTTP方法。例如，一个Servlet写了一个HttpServlet的子类并重载了doGet方法，doOption会返回下面的头：
      Allow: GET,HEAD,TRACE,OPTIONS
      你一般不需要重载这个方法。
      5、doPost
      protected void doPost(HttpServletRequest request,
            HttpServletResponse response) throws ServletException,
            IOException;
      被这个类的service方法调用，用来处理一个HTTP POST操作。这个操作包含请求体的数据，Servlet应该按照他行事。
      这个操作可能有负面影响。例如更新存储的数据或在线购物。
      这一方法的默认执行结果是返回一个HTTP BAD_REQUEST错误。当你要处理POST操作时，你必须在HttpServlet的子类中重载这一方法。
      6、doPut
      protected void doPut(HttpServletRequest request,
            HttpServletResponse response) throws ServletException,
            IOException;
      被这个类的service方法调用，用来处理一个HTTP PUT操作。这个操作类似于通过FTP发送文件。
      这个操作可能有负面影响。例如更新存储的数据或在线购物。
      这一方法的默认执行结果是返回一个HTTP BAD_REQUEST错误。当你要处理PUT操作时，你必须在HttpServlet的子类中重载这一方法。
      7、doTrace
      protected void doTrace(HttpServletRequest request,
            HttpServletResponse response) throws ServletException,
            IOException;
      被这个类的service方法调用，用来处理一个HTTP TRACE操作。这个操作的默认执行结果是产生一个响应，这个响应包含一个反映trace请求中发送的所有头域的信息。
      当你开发Servlet时，在多数情况下你需要重载这个方法。
      8、getLastModified
      protected long getLastModified(HttpServletRequest request);
      返回这个请求实体的最后修改时间。为了支持GET操作，你必须重载这一方法，以精确地反映最后修改的时间。这将有助于浏览器和代理服务器减少装载服务器和网络资源，从而更加有效地工作。返回的数值是自1970-1-1日（GMT）以来的毫秒数。
默认的执行结果是返回一个负数，这标志着最后修改时间未知，它也不能被一个有条件的GET操作使用。
      9、service
      protected void service(HttpServletRequest request,
            HttpServletResponse response) throws ServletException,
            IOException;
      public void service(ServletRequest request, ServletResponse response)
            throws ServletException, IOException;
      这是一个Servlet的HTTP-specific方案，它分配请求到这个类的支持这个请求的其他方法。
      当你开发Servlet时，在多数情况下你不必重载这个方法。

      八、HttpSessionBindingEvent类\
      定义\
      public class HttpSessionBindingEvent extends EventObject
      这个事件是在监听到HttpSession发生绑定和取消绑定的情况时连通HttpSessionBindingListener的。这可能是一个session被终止或被认定无效的结果。
      事件源是HttpSession.putValue或HttpSession.removeValue。
      构造函数
      public HttpSessionBindingEvent(HttpSession session, String name);
      通过引起这个事件的Session和发生绑定或取消绑定的对象名构造一个新的HttpSessionBindingEvent。
      方法
      1、getName
      public String getName();
      返回发生绑定和取消绑定的对象的名字。
      2、getSession
      public HttpSession getSession();
      返回发生绑定和取消绑定的session的名字。

         九、HttpUtils类\
      定义\
      public class HttpUtils
      收集HTTP Servlet使用的静态的有效的方法。
      方法
      1、getRequestURL
      public static StringBuffer getRequestURL(HttpServletRequest
            request);
      在服务器上重建客户端用来建立请求的URL。这个方法反映了不同的协议（例如http和https）和端口，但不包含查询字符串。
      这个方法返回一个StringBuffer而不是一个String，这样URL可以被Servlet开发者有效地修改。
      2、parsePostData
      public static Hashtable parsePostData(int len,
            ServletInputstream in);
      解析一个包含MIME类型application/x-www-form-urlencoded的数据的流，并创建一个具有关键值-数据对的hash table。这里的关键值是字符串，数据是该字符串所对应的值的列表。一个关键值可以在POST的数据中出现一次或多次。这个关键值每出现一次，它的相应的值就被加入到hash table中的字符串所对应的值的列表中。
      从POST数据读出的数据将经过URL解码，+将被转换为空格以十六进制传送的数据（例如%xx）将被转换成字符。
      当POST数据无效时，该方法抛出一个IllegalArgumentException。
      3、parseQueryString
      public static Hashtable parseQueryString(String s);
      解析一个查询字符串，并创建一个具有关键值-数据对的hash table。这里的数据是该字符串所对应的值的列表。一个关键值可以出现一次或多次。这个关键值每出现一次，它的相应的值就被加入到hash table中的字符串所对应的值的列表中。
      从查询字符串读出的数据将经过URL解码，+将被转换为空格以十六进制传送的数据（例如%xx）将被转换成字符。
      当查询字符串无效时，该方法抛出一个IllegalArgumentException。

posted @ 2007-07-26 16:34 edsonjava 阅读(460) | 评论 (0) | 编辑收藏

2007年7月16日

CVSNT 2.5.03 Installation on Windows 2003

Author: Bo Berglund
Notice:
This guide is written as an installation help for CVSNT 2.5.03 and higher on Windows 2003 server.
Most of the discussion is also valid for installation on Windows XP-Pro (see below for an important setting).
NOTE! You cannot use XP-Home for CVSNT!
The guide uses the Innosetup based installer that I maintain but similar results can probably be obtained by using the Innosetup installer published by Oliver Giesen as well.
I am not using the MSI installer from the official CVSNT website since I cannot accept non-opensource software if anything else is available.

Table of contents
CVSNT Installation
Configuring the server
Adding CVS users
Adding CVS administrators
Disabling pserver as security measure
The cvs passwd command for adding users
Managing pserver and sserver users
Using the SSPI protocol
Fine-tuning user access of CVS
Using spaces with CVSNT

Links:
CVSNT Auditing Configuration Tutorial
Innosetup CVSNT Installer download
CVSMailer homepage, Automatic email on commits and other events
ViewCvs Installer download
CVSNT command reference
CVSNT download (where you can download the latest CVSNT versions)

Karl Fogel's book 'Open Source development with CVS'
The free part of Karl Fogel's book in HTML format
DevGuy's CVS information pages
CVS-Gui (WinCvs) homepage
WinCvs Dialy use guide
WinCvs 1.3 manual (PDF format)

WinCvs download (on SourceForge)

Installation of the CVSNT server

File system type
Make sure your system is only using the NTFS file system!
Also make sure you are logged on as an administrator of the PC (using an account with administrative priviliges).
And most important: Use the local disk on the CVSNT server!

IMPORTANT for XP-Pro users:
You MUST switch off Simple File Sharing, which is the default for XP (as recommended by Microsoft to make XP somewhat compatible with Win95-98-ME)!
You do this by opening a Windows Explorer and then use the menu command Tools/Folder Options. Select the View tab and scroll down to the bottom where you find this item. Uncheck it now!
Simple File Sharing

Now for the actual installation and configuration:

1. Get the latest release of CVSNT
Get the latest CVSNT Innosetup installation from Innosetup CVSNT Installer download

2. Create CVS directories
Create two directories on the target machine, c:\cvsrepos and c:\cvsrepos\cvstemp. If you have a separate disk partition to spare for CVS then use that instead. The important point here is that the disk where the repository is located on is NTFS.

3. Directory security and permissions
Give c:\cvsrepos\cvstemp security settings that allows full control for all accounts including SYSTEM.
Important:
The cvstemp directory must NOT be located in either c:\WINNT\Temp or anywhere in the "C:\Documents and Settings" tree because these locations have imposed restrictions on user access!
Notice that on XP-Pro out of the box from Microsoft the permissions cannot be set like this until "Simple File Sharing" is switched off (see above). So you must do this if you use XP-Pro. XP-Home is totally unsuitable for CVSNT!

4. Install CVSNT
Run the downloaded CVSNT setup file and make sure to change the installation path to c:\programs\cvsnt (I am paranoid about removing any spaces in paths used by cvs!)
Start screen:
Install screen #1

License agreement:
Install screen #2

Install directory selection:
Note:
I strongly recommend that you install CVSNT to a path that does NOT contain any embedded spaces, for example like this:
Install screen #3

Installation component selection screen:
Install screen #4

Start menu selection:
Install screen #5

Task selection screen:
Install screen #6

Ready to install!
Install screen #7

Install in progress
Install screen #8

Release notes
Install screen #9

Installation done!
Install screen #10

Configuring the CVSNT server and repository

1. CVSNT Control Panel configuration
CVSNT is configured from the CVSNT Control Panel, which can be reached via the shortcut link placed under the Start menu during installation.

Now open the CVSNT control panel applet and do the following:

2. Shut down the CVSNT service
Check that the CVSNT Service is not running (Start button is enabled). This is the initial screen showing that both services are running:

Configuration screen #1
If it is started then stop it. You can leave the Lock Service running.

3. Repository creation
The tab will initially look like this:

Configuration screen #2

4. Add repository
Now you will add a repository to the server. This is done using the "Add" button. When you click this a dialogue shows up where you will define your repository.

Empty repo

5. Repository folder
Click the ellipsis button for Location to bring up the folder browser.
Now you can browse to the location you want for your repository and add a new folder here.
NOTE:
I strongly advice NOT to use paths with embedded spaces for CVS!

Browse for folder

6. Name repository
Now fill in the description and the name of the repository as well.
NOTE:
Do NOT accept the suggested name, which is the same as the folder path!
Instead only use the bare folder name with a leading / like this:

CVSNT AddRepository

7. Initializing the repository
When you click the OK button there will be a dialog where CVSNT offers to initialize the new repository.
When you click Yes then the new folder will be converted to a real repository:

8. First repository added!
Now the list of repositories has been populated with the first repository:

CVSNT Repository

You can add as many as you like (almost) but please do not fall for the temptation to use one repository for each and every project! There are a lot of possibilities to streamline the development process using CVSNT, but many of these use the virtual modules concept and this is only possible within a single repository.

9. Server Settings
Now go on to the Server Settings tab.
Here the default settings are all right for now, except the Temporary Directory setting.

Serversettings

NOTICE about Domains:
You can set the Default domain entry to either the CVSNT server PC name (as in the example above) or the domain name to which the CVSNT server belongs. CVSNT will strip the domain part from all accounts that log on using the default domain before processing. All other logons will be processed using their complete names (DOMAIN\username). The result of this is that all users that "belong" to the domain specified in this box will be logged using only the account name, likewise these usernames will be supplied to the administrative scripts without the domain name. All others will have a domain name added. This must be accounted for in any admin script used.
The CVSROOT/users file is one such admin file that needs to be handled with care concerning domain and non-domain entries.

Temp dir: Use the ellipsis button to browse for the folder prepared for this purpose above:

Tempdir

10. Compatibility
On the next tab (Compatibility Options) there is nothing you need to change for now:

Serversettings screen #1

11. Plugins and protocols
The Plugins tab define a lot of the extra features of CVSNT including some aspects of the connection protocols. The sceen list the available plugins and when you select a line you will be able to configure this plugin by clicking the configure button:

Serversettings screen #1

12. Sserver configuration
Here is the configuration window for the SSERVER protocol plugin. Please set it like this:

SSPI config screen

13. Advanced settings
The final tab on the Control Panel deals with advanced configuration settings and you need not change anything here.

Configuration screen #1

14. Apply configuration changes
Now click the Apply button! This is really important, nothing will happen unless you do this! Note that after you have done this the Apply button is disabled.

15. Start the CVSNT service
Go back to the first tab and click the Start button. After a few moments the Stop button will be highlighted.
Now CVSNT runs (success!)

16. Restart the server
In order for you to be able to use the command line cvs you need to have the path variable set to include the location of the cvs.exe just installed (c:\programs\cvsnt). Since the installer will have put this into the system path variable it will work if you restart the server.
You can check this by going to a command window and typing the command:
cvs --ver
If this results in an eror message then you should restart the server PC before continuing.

Adding and managing CVS users for pserver and sserver access

This is a step that is only needed if you plan on using the sserver or pserver protocols with this CVS server. If your users are all on Windows PC:s pserver is not recommended since it has inherent security flaws. Instead use SSPI because that protocols integrate much better with Windows. If you decide to go with sspi (recommended) then you can skip the discussion on how to add and manage users in this section.

1. Creating CVS accounts on the server
In order for pserver and sserver to work you have to define CVS users, but before you can do this you need to create two real accounts on the server. These accounts will be used by the CVS users as the working accounts.
You need one account which will be a CVS administrative account and one which will be a normal user account. Note that the CVS administrator need not be a server administrator!

Usermanager

The two accounts are added through the Users dialog in Computer Management.
I have used the account names cvsadmin and cvsuser as shown above.

2. Adding CVS users
Open a command window and do the following (replace items <text> with the real values from your system).

set cvsroot=:sspi:<computername>:/TEST
cvs passwd -a <account name>

You will now be asked to enter a password for this user. This password is only for CVS use so it should not be the real system password! Enter the password twice.
Now the CVSROOT/passwd file will be created and the user you entered will be added to the list in this file.
This step is necessary if you are going to use the pserver or sserver protocol in the future since there is no way to log in with pserver/sserver unless there is a passwd file present with the user listed.

Important note:
Any user entered like this MUST be an NT user on the local system! CVS will not accept any user login that is not connected to a "real" account.

3. Aliasing CVS users to real accounts
In order to have many CVS user logins you don't need to create masses of system accounts! Instead you can "alias" a CVS login to a "real" account using this command:

cvs passwd -r <real accountname> -a <cvs login name>

What will happen now is that to CVS the user will be known and registered as the CVS login given in the command, but for file operations that will encounter permission issues the commands will be executed in the context of the real system account that was aliased. This makes it possible to use NTFS file system permissions to limit access to certain parts of the repository to some users. You simply create a system account for which you set limited permissions and then you alias the CVS login to this user.

Note that this command will fail if there is a space embedded in the real account name! DON'T ever use spaces in these contexts!!!!! (But using quotes may solve the problem like this:
cvs passwd -r "system admin" -a "new user"
Since I don't have a valid user with embedded space I could not check the quotes trick with the valid user name parameter, but adding a CVS login with space embedded *can* be done with quotes.)

Examples:
cvs passwd -r cvsuser -a charlie

or if you want the new user to be a CVS administrator:

cvs passwd -r cvsadmin -a rogerh

Note about Domain users:
You can add domain users with the following command:
cvs passwd -r <real accountname> -D <domain name> -a <cvs login name>
This command is reported by a user to have worked for him. I cannot check it because I don't have a domain. But based on information from the mail list I think that it will only work if there is a trust between the CVSNT server PC and the domain controller. If the CVSNT server PC is a member of the domain then this is the case.

The server is now ready to be used and you can check the pserver functionality by doing this:

4. Testing the CVS connection with sserver
Open another command window and type:
set cvsroot=:sserver:<user>@<computername>:/TESTReplace <user> and <computername> with valid entries like:
set cvsroot=:sserver:charlie@cvsserver:/TEST
Then:
cvs login (enter password on prompt)
cvs ls -l -R
(this should give you a list of the files in TEST/CVSROOT)

5. Testing the CVS connection with pserver
Open another command window and type:
set cvsroot=:pserver:<user>@<computername>:/TESTReplace <user> and <computername> with valid entries like:
set cvsroot=:pserver:charlie@cvsserver:/TEST
Then:
cvs login (enter password on prompt)
cvs ls -l -R
(this should give you a list of the files in TEST/CVSROOT)

6. Testing the CVS connection from another PC
Open a command window on another PC where you have installed the CVSNT in client only mode and type:
set cvsroot=:sserver:<user>@<computername>:/TESTReplace <user> and <computername> with valid entries like:
set cvsroot=:pserver:charlie@cvsserver:/TEST
Then:
cvs login (enter password on prompt)
cvs ls -l -R
(this should give you a list of the files in TEST/CVSROOT)

If you cannot get this far, for example if the login fails, then you should check the Windows Firewall settings on the CVSNT server:

7. Modifying Windows Firewall to allow CVS calls

Go to Control Panel
Open the Windows Firewall item.
Select the Exceptions tab
Click the "Add port" button
Enter the name CVSNT and port number 2401 as a TCP port
Accept back to the main screen
Make sure Windows Firewall is set to ON

Administrating the repository, users with admin rights

There have been a number of reports that people have not been able to add users or execute the cvs admin command even though they were members of the Administartors group or even of Domain Admins. In order to avoid this there is a simple way to manage who will have admin rights on the CVSNT server. It is done through the CVSROOT/admin file.
Here is how to:

Create a text file called admin (no extension) inside the CVSROOT directory of the repository.
Edit this file by adding on separate lines the login names of the users you want to give administrative priviliges on the CVS server.

The file could look like this:


cvsadmin
charlie
jennifer
john

Now each of these users are able to add new users, change their passwords and use the cvs admin command.

Disabling the pserver protocol

If you are exposing your CVSNT server to the Internet you should disable the :pserver: protocol because it uses too low security levels. Only the password for login is 'encrypted' and this is only barely so. All other traffic is in cleartext...
To protect your data you should use the :sspi: protocol instead (and set its encryption flag of course).
As an alternative with the same basic functionality as pserver you can use sserver instead. This uses encrypted connections by default and is probably better if you want to add cvs logins that do not correspond to real accounts (see above).
Disabling any protocol on the CVSNT server is done through the CVSNT Control Panel Plugins tab.
Select the :pserver: protocol line and click Configure. This will bring up a dialogue where you can just uncheck the checkbox to disable the protocol:

Configuration screen #1

Adding new pserver users using the cvs passwd command

As soon as you have logged on using pserver or sserver with a cvs login name that is the same as a local system admin or is aliased to an admin account or is listed in the CVSROOT/admin file then you can add and delete CVS user logins with the passwd command. Here is the full syntax for this command:

Usage:
cvs passwd [-a] [-x] [-X] [-r real_user] [-R] [-D domain] [username]
-a Add user
-x Disable user
-X Delete user
-r Alias username to real system user
-R Remove alias to real system user
-D Use domain password

Example:
cvs passwd -r charlie -a john
This adds a CVS login john with a system alias to account charlie. When the command is executed there will be a password dialogue that asks for the password of john twice for confirmation. Note that this is NOT the actual system password of account john, it is the CVS login password only used by CVSNT.
After the command completes there will be a new line in the CVSROOT/passwd file looking somewhat like this:
john:KacIT8t1F/SKU:charlie
The part between the :: is the DES encrypted password you typed in and will be used by the CVSNT service during login to validate john. Once accepted the account charlie will instead be used so the password is no longer used. The CVSNT service has full priviliges to act on charlie's behalf and this is what it does too.

Managing pserver and sserver users

If you plan on using pserver or sserver with a fairly large number of different user logins then you might want to do as follows (also described above):

Create a local user on the CVSNT server by the name of "cvsuser".
Login to the cvs server using an admin account.
Add the logins with the following command to alias to the cvsuser:
cvs passwd -r cvsuser -a <login user name>
You will be asked twice for the login password.

You may add as many pserver users this way as you like. They will all be individually identified by the login name even though the operations on the repository will be done in the cvsuser account context. Mail systems will recognize these user names as well (see below).

Using the SSPI protocol for CVSNT access

A few years ago the SSPI protocol was added to CVSNT. It works over TCP/IP so it can more easily traverse firewalls. Like :ntserver:, which is now depreciated, the :sspi: protocol does not need a login, instead the login you did when you started your workstation is used with this protocol.

Limiting user access with sspi
When used normally sspi will accept connections from all system users that authenticate against the system (local or domain). Often this is not really what we want, instead we want to use the same mechansism as is used with :pserver:. Here the CVSROOT/passwd file limits the logins accepted by CVSNT to those mentioned in the file.
With :sspi: this is quite possible, you only have to list the account login names that you want to give CVS access in the passwd file. You also have to set the parameter
SystemAuth = No
in the CVSROOT/config file.
Note that in this case there is no need for entering passwords into the passwd file, sspi uses the system login and the passwd file is only used as a list of accepted users. So simply issuing this command when logged in as a CVS administrator will work:
cvs passwd -a newuser
(press enter twice to tell CVSNT that no password is used)

Fine-tuning user access of CVS

The NTFS file system permissions can be used to tune the access to the CVS repository with more granularity than the passwd file allows. Here is how it is done:

Create a number of NT user groups where members can be added and removed easily.
Don't use aliases in the login scheme, let each user login as himself, for example using :sspi:.
Set permissions (read/write, read only, no access) on the module level in the repository using the CVS groups as tokens.
Give membership to the CVS user groups as needed to individual NT accounts

Using spaces with CVSNT

CVSNT tries its best to handle spaces embedded in file and directory names. But there still are instances where the use of spaces breaks the CVS functionality badly. So my recommendations are:

Install CVSNT to a path that does not contain spaces.
Place the repository on a path not containing spaces.
If you install additional software like PERL or RCS, don't use spaces!
Instruct your users not to use spaces in directory names that are to be handled by CVS.

People may argue that CVSNT handles these issues for them, but in my experience this is only partly true. For example the loginfo and notify script parsing breaks fully when handling files with embedded spaces. There are other places as well...
Also by allowing spaces you will make it impossible to later move the repository to a *nix system (Unix, Linux etc).
So you are much better off prohibiting spaces up front!

Afterwords

This tutorial is written 2005-11-16 and is based on CVSNT version 2.5.03.2148.
The test system is Windows Enterprise Server 2003 with SP1 installed running in Virtual PC 2004 SP1 on my development PC. The server is not member of a domain.

Comments? Send me a message: Bo Berglund

posted @ 2007-07-16 14:22 edsonjava 阅读(2346) | 评论 (0) | 编辑收藏

2007年5月18日

jive的设计模式-jive源代码研究2

2 Jive与设计模式

Jive论坛系统使用大量设计模式巧妙地实现了一系列功能。因为设计模式的通用性和可理解性，将帮助更多人很快地理解 Jive论坛源码，从而可以依据一种“协定”来动态地扩展它。那么使用设计模式还有哪些好处？

2.1 设计模式

设计模式是一套被反复使用、多数人知晓的、经过分类编目的、代码设计经验的总结。使用设计模式是为了可重用代码、让代码更容易被他人理解、保证代码可靠性。毫无疑问，设计模式于己于他人于系统都是多赢的。设计模式使代码编制真正工程化，设计模式是软件工程的基石。

GOF（设计模式作者简称）《设计模式》这本书第一次将设计模式提升到理论高度，并将之规范化，该书提出了23种基本设计模式。自此，在可复用面向对象软件的发展过程中，新的大量的设计模式不断出现。

很多人都知道Java是完全面向对象的设计和编程语言，但是由于接受教育以及经验的原因，大多数程序员或设计人员都是从传统的过程语言转变而来，因此在思维习惯上要完全转变为面向对象的设计和开发方式是困难的，而学习设计模式可以更好地帮助和坚固这种转变。

凡是学习完成设计模式的人都有一种类似重生的感觉，这种重生可以从很多方面去解释。换一种新的角度来看待和解决问题应该是一种比较贴切的解释，而这种新的思维角度培养属于基础培训，因此，设计模式是学习Java的必读基础课程之一。

由于设计模式概念比较抽象，对于初学者学习有一定的难度，因此结合Jive论坛系统学习设计模式将是一种很好的选择。

掌握了设计模式，将会帮助程序员或设计人员以更加可重用性、可伸缩性的眼光来开发应用系统，甚至开发通用的框架系统。框架系统是构成一类特定软件可复用设计的一组相互协作的类，主要是对应用系统中反复重用部分的提炼，类似一种模板，这是一种结构性的模板。

框架通常定义了应用体系的整体结构、类和对象的关系等设计参数，以便于具体应用实现者能集中精力于应用本身的特定细节。框架强调设计复用，而设计模式最小的可重用单位，因此框架不可避免地会反复使用到设计模式。关于通用框架系统的设计开发将在以后章节中讨论。

其实Jive论坛本身也形成了一个基于Web结构的通用框架系统，因为它很多设计思想是可以重用的，例如设定一个总体入口，通过入口检查用户的访问控制权限，当然还有其他各方面的功能实现方式都是值得在其他系统中借鉴的，也正因为它以模式的形式表现出来，这种可重用性和可借鉴性就更强。

2.2 ForumFactory与工厂模式

工厂模式是GOF设计模式的主要常用模式，它主要是为创建对象提供了一种接口，工厂模式主要是封装了创建对象的细节过程，从而使得外界调用一个对象时，根本无需关心这个对象是如何产生的。

在GOF设计模式中，工厂模式分为工厂方法模式和抽象工厂模式。两者主要区别是，工厂方法是创建一种产品接口下的产品对象，而抽象工厂模式是创建多种产品接口下的产品对象，非常类似Builder生成器模式。在平时实践中，使用较多的基本是工厂方法模式。

以类SampleOne为例，要创建SampleOne的对象实例:

SampleOne sampleOne = new SampleOne();

如果Sample类有几个相近的类：SampleTwo或SampleThree，那么创建它们的实例分别是：

SampleTwo sampleTwo = new SampleTwo();

SampleThree sampleThree = new SampleThree();

其实这3个类都有一些共同的特征，如网上商店中销售书籍、玩具或者化妆品。虽然它们是不同的具体产品，但是它们有一个共同特征，可以抽象为“商品”。日常生活中很多东西都可以这样高度抽象成一种接口形式。上面这3个类如果可以抽象为一个统一接口 SampleIF，那么上面语句就可以成为：

SampleIF sampleOne = new SampleOne();

SampleIF sampleTwo = new SampleTwo();

SampleIF sampleThree = new SampleThree();

在实际情况中，有时并不需要同时生成3种对象，而是根据情况在3者之中选一个。在这种情况下，需要使用工厂方法来完成了，创建一个叫SampleFactory的抽象类：

public class SampleFactory{

public abstract SampleIF creator();

}

在这个抽象工厂类中有一个抽象方法creator，但是没有具体实现，而是延迟到它的子类中实现，创建子类SampleFactoryImp：

public class SampleFactoryImp extends SampleFactory{

public SampleIF creator(){

//根据其他因素综合判断返回具体产品

//假设应该返回SampleOne对象

return new SampleOne();

}

在SampleFactoryImp中根据具体情况来选择返回SampleOne、SampleTwo或SampleThree。所谓具体情况有很多种：上下文其他过程计算结果；直接根据配置文件中配置。

上述工厂方法模式中涉及到一个抽象产品接口Sample，如果还有其他完全不同的产品接口，如Product 等，一个子类SampleFactoryImp只能实现一套系列产品方案的生产，如果还需要另外一套系统产品方案，就可能需要另外一个子类 SampleFactoryImpTwo来实现。这样，多个产品系列、多个工厂方法就形成了抽象工厂模式。

前面已经讨论在Jive中设置了论坛统一入口，这个统一入口就是ForumFactory，以下是ForumFactory的主要代码：

public abstract class ForumFactory {

　　private static Object initLock = new Object();

　　private static String className = " com.Yasna.forum.database.DbForumFactory";

　　private static ForumFactory factory = null;

　　public static ForumFactory getInstance(Authorization authorization) {

　　　　if (authorization == null) {

　　　　　　return null;

　　　　}

　　　　//以下使用了Singleton 单态模式，将在2.3节讨论

　　　　if (factory == null) {

　　　　　　synchronized(initLock) {

　　　　　　　　if (factory == null) {

　　　　　　　　　　　　... //从配置文件中获得当前className

　　　　　　　　　　try {

　　　　　　　　　　　　　　//动态装载类

　　　　　　　　　　　　　　Class c = Class.forName(className);

　　　　　　　　　　　　　　factory = (ForumFactory)c.newInstance();

　　　　　　　　　　}

　　　　　　　　　　catch (Exception e) {

　　　　　　　　　　　　　　return null;

　　　　　　　　　　}

　　　　//返回 proxy.用来限制授权对forum的访问

　　　　return new ForumFactoryProxy(authorization, factory,factory.getPermissions(authorization));

　　}

　　//创键产品接口Forum的具体对象实例

　　public abstract Forum createForum(String name, String description)

　　throws UnauthorizedException, ForumAlreadyExistsException;

//创键产品接口ForumThread的具体对象实例

public abstract ForumThread createThread(ForumMessage rootMessage)

throws UnauthorizedException;

//创键产品接口ForumMessage的具体对象实例

public abstract ForumMessage createMessage();

　　....

}

ForumFactory中提供了很多抽象方法如createForum、createThread和 createMessage()等，它们是创建各自产品接口下的具体对象，这3个接口就是前面分析的基本业务对象Forum、ForumThread和 ForumMessage，这些创建方法在ForumFactory中却不立即执行，而是推迟到ForumFactory子类中实现。

ForumFactory的子类实现是 com.Yasna.forum.database.DbForumFactory，这是一种数据库实现方式。即在DbForumFactory中分别实现了在数据库中createForum、createThread和createMessage()等3种方法，当然也提供了动态扩展到另外一套系列产品的生产方案的可能。如果使用XML来实现，那么可以编制一个XmlForumFactory的具体工厂子类来分别实现3种创建方法。

因此，Jive论坛在统一入口处使用了抽象工厂模式来动态地创建论坛中所需要的各种产品，如图3-4所示。

图3-4 ForumFactory抽象工厂模式图

图3-4中，XmlForumFactory和DbForumFactory作为抽象工厂 ForumFactory的两个具体实现，而Forum、ForumThread和ForumMessage分别作为3个系列抽象产品接口，依靠不同的工厂实现方式，会产生不同的产品对象。

从抽象工厂模式去理解Jive论坛统一入口处，可以一步到位掌握了几个类之间的大概关系。因为使用了抽象工厂模式这种通用的设计模式，可以方便源码阅读者快速地掌握整个系统的结构和来龙去脉，图3-4这张图已经初步展示了Jive的主要框架结构。

细心的读者也许会发现，在上面ForumFactory有一个getInstance比较令人费解，这将在2.3节进行讨论。

2.3 统一入口与单态模式

在上面ForumFactory的getInstance方法使用单态（SingleTon）模式。单态模式是保证一个类有且仅有一个对象实例，并提供一个访问它的全局访问点。

前面曾提到ForumFactory是Jive提供客户端访问数据库系统的统一入口。为了保证所有的客户端请求都要经过这个ForumFactory，如果不使用单态模式，客户端下列调用语句表示生成了ForumFactory实例：

ForumFactory factory = new DbForumFactory();

客户端每发生一次请求都调用这条语句，这就会发生每次都生成不同factory对象实例，这显然不符合设计要求，因此必须使用单态模式。

一般在Java实现单态模式有几种选择，最常用而且安全的用法如下：

public class Singleton {

　　private Singleton(){}

　　//在自己内部定义自己一个实例，是不是很奇怪

　　//注意这是private，只供内部调用

　　private static Singleton instance = new Singleton();

　　//这里提供了一个供外部访问本class的静态方法，可以直接访问

　　public static Singleton getInstance() {

　　　　return instance;

　　}

单态模式一共使用了两条语句实现：第一条直接生成自己的对象，第二条提供一个方法供外部调用这个对象，同时最好将构造函数设置为private，以防止其他程序员直接使用new Singleton生成实例。

还有一种Java单态模式实现：

public class Singleton {

　　private Singleton(){}

　　private static Singleton instance = null;

　　public static synchronized Singleton getInstance() {

　　　　if (instance==null)

　　　　　　instance＝new Singleton()

　　　　return instance;

　　}

｝

在上面代码中，使用了判断语句。如果instance为空，再进行实例化，这成为lazy initialization。注意getInstance()方法的synchronized，这个synchronized很重要。如果没有 synchronized，那么使用getInstance()在第一次被访问时有可能得到多个Singleton实例。

关于lazy initialization的Singleton有很多涉及double-checked locking (DCL)的讨论，有兴趣者可以进一步研究。一般认为第一种形式要更加安全些；但是后者可以用在类初始化时需要参数输入的情况下。

在Jive的ForumFactory中采取了后者lazy initialization形式，这是为了能够动态配置指定ForumFactory的具体子类。在getInstance中，从配置文件中获得当前工厂的具体实现，如果需要启动XmlForumFactory，就不必修改ForumFactory代码，直接在配置文件中指定className的名字为 XmlForumFactory。这样通过下列动态装载机制生成ForumFactory具体对象：

Class c = Class.forName(className);

factory = (ForumFactory)c.newInstance();

这是利用Java的反射机制，可以通过动态指定className的数值而达到生成对象的方式。

使用单态模式的目标是为了控制对象的创建，单态模式经常使用在控制资源的访问上。例如数据库连接或 Socket连接等。单态模式可以控制在某个时刻只有一个线程访问资源。由于Java中没有全局变量的概念，因此使用单态模式有时可以起到这种作用，当然需要注意是在一个JVM中。

2.4 访问控制与代理模式

仔细研究会发现，在ForumFactory的getInstance方法中最后的返回值有些奇怪。按照单态模式的概念应该直接返回factory这个对象实例，但是却返回了ForumFactoryProxy的一个实例，这实际上改变了单态模式的初衷。这样客户端每次通过调用ForumFactory的getInstance返回的就不是ForumFactory的惟一实例，而是新的对象。之所以这样做是为了访问权限的控制，姑且不论这样做的优劣，先看看什么是代理模式。

代理模式是属于设计模式结构型模式中一种，它是实际访问对象的代理对象，或者影子对象，主要达到控制实际对象的访问。这种控制的目的很多，例如提高性能等。即远程代理模式，这种模式将在以后章节讨论。

其中一个主要的控制目的是控制客户端对实际对象的访问权限。在Jive系统中，因为有角色权限的分别，对于Forum、ForumThread和FroumMessage的访问操作必须经过权限机制验证后才能进行。

以ForumFactoryProxy中的createForum方法为例，其实ForumFactoryProxy也是FroumFactory的一种工厂实现，它的createForum具体实现如下：

public Forum createForum(String name, String description)

throws UnauthorizedException, ForumAlreadyExistsException

{

if (permissions.get(ForumPermissions.SYSTEM_ADMIN)) {

Forum newForum = factory.createForum(name, description);

return new ForumProxy(newForum, authorization, permissions);

}

else {

throw new UnauthorizedException();

}

在这个方法中进行了权限验证，判断是否属于系统管理员。如果是，将直接从DbForumFactory对象 factory的方法createForum中获得一个新的Forum对象，然后再返回Forum的子类代理对象ForumProxy。因为在Forum 中也还有很多属性和操作方法，这些也需要进行权限验证。ForumProxy和ForumFactoryProxy起到类似的作用。

Jive中有下列几个代理类：

· ForumFactoryProxy：客户端和DbForumFactory之间的代理。客户端访问DbForumFactory的任何方法都要先经过ForumFactoryProxy相应方法代理一次。以下意思相同。

· ForumProxy：客户端和DbForum之间的代理，研究Forum对象的每个方法，必须先看ForumProxy对象的方法。

· ForumMessageProxy：客户端和DbForumMessage之间的代理。

· ForumThreadProxy：客户端和DbForumThread之间的代理。

User和Group也有相应的代理类。

由以上分析看出，每个数据对象都有一个代理。如果系统中数据对象非常多，依据这种一对一的代理关系，会有很多代理类，将使系统变得不是非常干净，因此可以使用动态代理来代替这所有的代理类，具体实现将在以后章节讨论。

2.5 批量分页查询与迭代模式

迭代（Iterator）模式是提供一种顺序访问某个集合各个元素的方法，确保不暴露该集合的内部表现。迭代模式应用于对大量数据的访问，Java Collection API中Iterator就是迭代模式的一种实现。

在前面章节已经讨论过，用户查询大量数据，从数据库不应该直接返回ResultSet，应该是 Collection。但是有一个问题，如果这个数据很大，需要分页面显示。如果一下子将所有页面要显示的数据都查询出来放在Collection，会影响性能。而使用迭代模式则不必将全部集合都展现出来，只有遍历到某个元素时才会查询数据库获得这个元素的数据。

以论坛中显示帖子主题为例，在一个页面中不可能显示所有主题，只有分页面显示，如图3-5所示。

图3-5中一共分15页来显示所有论坛帖子，可以从显示Forum.jsp中发现下列语句可以完成上述结果：

ResultFilter filter = new ResultFilter(); //设置结果过滤器

filter.setStartIndex(start); //设置开始点

filter.setNumResults(range); //设置范围

ForumThreadIterator threads = forum.threads(filter); //获得迭代器

while(threads.hasNext){

//逐个显示threads中帖子主题，输出图3-5中的每一行

}

图3-5 分页显示所有帖子

上述代码中主要是从Forum的threads方法获得迭代器ForumThreadIterator的实例，依据前面代理模式中分析、研究Forum对象的方法，首先是看ForumProxy中对应方法，然后再看DbForum中对应方法的具体实现。在 ForumProxy中，threads方法如下：

public ForumThreadIterator threads(ResultFilter resultFilter) {

ForumThreadIterator iterator = forum.threads(resultFilter);

return new ForumThreadIteratorProxy(iterator, authorization, permissions);

}

首先是调用了DbForum中具体的threads方法，再追踪到DbForum中看看，它的threads方法代码如下：

public ForumThreadIterator threads(ResultFilter resultFilter) {

//按resultFilter设置范围要求获得SQL查询语句

String query = getThreadListSQL(resultFilter, false);

//获得resultFilter设置范围内的所有ThreadID集合

long [] threadBlock = getThreadBlock(query.toString(), resultFilter.getStartIndex());

//以下是计算查询区域的开始点和终点

int startIndex = resultFilter.getStartIndex();

int endIndex;

// If number of results is set to inifinite, set endIndex to the total

// number of threads in the forum.

if (resultFilter.getNumResults() == ResultFilter.NULL_INT) {

endIndex = (int)getThreadCount(resultFilter);

}else {

endIndex = resultFilter.getNumResults() + startIndex;

}

return new ForumThreadBlockIterator(threadBlock, query.toString(),

startIndex, endIndex, this.id, factory);

}

ResultFilter是一个查询结果类，可以对论坛主题Thread和帖子内容Message进行过滤或排序，这样就可以根据用户要求定制特殊的查询范围。如查询某个用户去年在这个论坛发表的所有帖子，那只要创建一个ResultFilter对象就可以代表这个查询要求。

在上面threads方法代码中，第一步是先定制出相应的动态SQL查询语句，然后使用这个查询语句查询数据库，获得查询范围内所有的ForumThread的ID集合，然后在这个ID集合中获得当前页面的ID子集合，这是非常关键的一步。

在这关键的一步中，有两个重要的方法getThreadListSQL和getThreadBlock：

· GetThreadListSQL：获得SQL查询语句query的值，这个方法Jive实现起来显得非常地琐碎。

· GetThreadBlock：获得当前页面的ID子集合，那么如何确定ID子集合的开始位置呢？查看getThreadBlock方法代码，可以发现，它是使用最普遍的ResultSet next()方法来逐个跳跃到开始位置。

上面代码的Threads方法中最后返回的是ForumThreadBlockIterator，它是抽象类 ForumThreadIterator的子类，而ForumThreadIterator继承了Collection的Iterator，以此声明自己是一个迭代器，ForumMessageBlockIterator实现的具体方法如下：

public boolean hasNext(); //判断是否有下一个元素

public boolean hasPrevious() //判断是否有前一个元素

public Object next() throws java.util.NoSuchElementException //获得下一个元素实例

ForumThreadBlockIterator中的Block是“页”的意思，它的一个主要类变量 threadBlock包含的是一个页面中所有ForumThread的ID，next()方法实际是对threadBlock中ForumThread 进行遍历，如果这个页面全部遍历完成，将再获取下一页（Block）数据。

在ForumThreadBlockIterator重要方法getElement中实现了两个功能：

· 如果当前遍历指针超过当前页面，将使用getThreadBlock获得下一个页面的ID子集合；

· 如果当前遍历指针在当前页面之内，根据ID获得完整的数据对象，实现输出；

ForumThreadBlockIterator的getElement方法代码如下：

private Object getElement(int index) {

if (index < 0) { return null; }

// 检查所要获得的 element 是否在本查询范围内（当前页面内）

if (index < blockStart ||

index >= blockStart + DbForum.THREAD_BLOCK_SIZE) {

try {

//从缓冲中获得Forum实例

DbForum forum = factory.cacheManager.forumCache.get(forumID);

//获得下一页的内容

this.threadBlock = forum.getThreadBlock(query, index);

this.blockID = index / DbForum.THREAD_BLOCK_SIZE;

this.blockStart = blockID * DbForum.THREAD_BLOCK_SIZE;

} catch (ForumNotFoundException fnfe) {

return null;

}

Object element = null;

// 计算这个元素在当前查询范围内的相对位置

int relativeIndex = index % DbForum.THREAD_BLOCK_SIZE;

// Make sure index isn't too large

if (relativeIndex < threadBlock.length) {

try {

// 从缓冲中获得实际thread 对象

element = factory.cacheManager.threadCache.get(

threadBlock[relativeIndex]);

} catch (ForumThreadNotFoundException tnfe) { }

}

return element;

}

ForumThreadBlockIterator是真正实现分页查询的核心功能， ForumThreadBlockIterator对象返回到客户端的过程中，遭遇ForumThreadIteratorProxy的截获，可以回头看看ForumProxy中的threads方法，它最终返回给调用客户端Forum.jsp的是ForumThreadIteratorProxy实例。

ForumThreadIteratorProxy也是迭代器ForumThreadIterator的一个子类，它的一个具体方法中：

public Object next() {

return new ForumThreadProxy((ForumThread)iterator.next(), authorization,

permissions);

}

这一句是返回一个ForumThreadProxy实例，返回就是一个ForumThread实例的代理。这里，Jive使用代理模式实现访问控制实现得不是很巧妙，似乎有代理到处“飞”的感觉，这是可以对之进行改造的。

从以上可以看出，Jive在输出如图3-5所示的多页查询结果时，采取了下列步骤：

（1）先查询出符合查询条件的所有对象元素的ID集合，注意不是所有对象元素，只是其ID的集合，这样节约了大量内存。

（2）每个页面视为一个Block，每当进入下一页时，获得下一个页面的所有对象的ID集合。

（3）输出当前页面的所有对象时，首先从缓冲中获取，如果缓冲中没有，再根据ID从数据库中获取完整的对象数据。

上述实现方法完全基于即查即显，相比于一般批量查询做法：一次性获得所有数据，然后遍历数据结果集ResultSet，Jive这种批量查询方式是一种比较理想的选择。

以上是ForumThread的批量显示，有关帖子内容ForumMessage也是采取类似做法。在每个 ForumThread中可能有很多帖子内容（ForumMessage对象集合），也不能在一个页面中全部显示，所以也是使用迭代模式来实现的。显示一个Forum主题下所有帖子内容的功能由ForumThread的messages()方法完成，检查它的代理类FroumThreadProxy如何具体完成：

public Iterator messages(ResultFilter resultFilter) {

Iterator iterator = thread.messages(resultFilter);

return new IteratorProxy(JiveGlobals.MESSAGE, iterator, authorization, permissions);

}

实现的原理基本相同，返回的都是一个Iterator代理类，在这些代理类中都是进行用户权限检验的。

Jive中也有关于一次性获得所有数据，然后遍历ResultSet的做法。这种做法主要适合一次性查询数据库的所有数据，例如查询当前所有论坛Forum，首先实现SQL语句：

SELECT forumID FROM jiveForum

获得所有Forum的forumID，这段代码位于DbForumFactory.java的forums方法中，如下：

public Iterator forums() {

if (forums == null) {

LongList forumList = new LongList();

Connection con = null;

PreparedStatement pstmt = null;

try {

con = ConnectionManager.getConnection();

// GET_FORUMS值是SELECT forumID FROM jiveForum

pstmt = con.prepareStatement(GET_FORUMS);

ResultSet rs = pstmt.executeQuery();

while (rs.next()) {

forumList.add(rs.getLong(1)); //将所有查询ID结果放入forumList中

}

}catch (SQLException sqle) {

sqle.printStackTrace();

} finally {

…

}

return new DatabaseObjectIterator(JiveGlobals.FORUM, forums, this);

}

forums方法是返回一个DatabaseObjectIterator，这个 DatabaseObjectIterator也是一个迭代器，但是实现原理要比ForumThreadBlockIterator简单。它只提供了一个遍历指针，在所有ID结果集中遍历，然后也是通过ID获得完整的数据对象。

总之，Jive中关于批量查询有两种实现方式：以ForumThreadBlockIterator为代表的实现方式适合在数据量巨大、需要多页查询时使用；而DatabaseObjectIterator则是推荐在一个页面中显示少量数据时使用。

2.6 过滤器与装饰模式

装饰（Decorator）模式是动态给一个对象添加一些额外的职责，或者说改变这个对象的一些行为。这就类似于使用油漆为某个东西刷上油漆，在原来的对象表面增加了一层外衣。

在装饰模式中，有两个主要角色：一个是被刷油漆的对象（decoratee）；另外一个是给decoratee刷油漆的对象（decorator）。这两个对象都继承同一个接口。

首先举一个简单例子来说明什么是装饰模式。

先创建一个接口：

public interface Work

{

　　public void insert();

}

这是一种打桩工作的抽象接口，动作insert表示插入，那么插入什么？下面这个实现表示方形木桩的插入：

public class SquarePeg implements Work{

　　public void insert(){

　　　　System.out.println("方形桩插入");

　　}

本来这样也许就可以满足打桩的工作需要，但是有可能土质很硬，在插入方形桩之前先要打一个洞，那么又将如何实现？可以编制一个Decorator类，同样继承Work接口，但是在实现insert方法时有些特别：

public class Decorator implements Work{

　　private Work work;

　　//额外增加的功能被打包在这个List中

　　private ArrayList others = new ArrayList();

　　public Decorator(Work work)

　　{

　　　　this.work=work;

　　　　others.add("打洞"); //准备好额外的功能

　　}

　　public void insert(){

　　　　otherMethod();

　　　　work.insert();

　　}

　　public void otherMethod()

　　{

　　　　ListIterator listIterator = others.listIterator();

　　　　while (listIterator.hasNext())

　　　　{

　　　　　　System.out.println(((String)(listIterator.next())) + " 正在进行");

　　　　}

在Decorator的方法insert中先执行otherMethod()方法，然后才实现SquarePeg的insert方法。油漆工Decorator给被油漆者SquarePeg添加了新的行为——打洞。具体客户端调用如下：

Work squarePeg ＝ new SquarePeg();

Work decorator = new Decorator(squarePeg);

decorator.insert();

本例中只添加了一个新的行为（打洞），如果还有很多类似的行为，那么使用装饰模式的优点就体现出来了。因为可以通过另外一个角度（如组织新的油漆工实现子类）来对这些行为进行混合和匹配，这样就不必为每个行为创建一个类，从而减少了系统的复杂性。

使用装饰模式可以避免在被油漆对象decoratee中包装很多动态的，可能需要也可能不需要的功能，只要在系统真正运行时，通过油漆工decorator来检查那些需要加载的功能，实行动态加载。

Jive论坛实现了信息过滤功能。例如可以将帖子内容中的HTML语句过滤掉；可以将帖子内容中Java代码以特别格式显示等。这些过滤功能有很多，在实际使用时不一定都需要，是由实际情况选择的。例如有的论坛就不需要将帖子内容的HTML语句过滤掉，选择哪些过滤功能是由论坛管理者具体动态决定的。而且新的过滤功能可能随时可以定制开发出来，如果试图强行建立一种接口包含所有过滤行为，那么到时有新过滤功能加入时，还需要改变接口代码，真是一种危险的行为。

装饰模式可以解决这种运行时需要动态增加功能的问题，且看看Jive是如何实现的。

前面讨论过，在Jive中，有主要几个对象ForumFactory、Forum以及ForumThread 和ForumMessage，它们之间的关系如图3-2所示。因此帖子内容ForumMessage对象的获得是从其上级FroumThread的方法 getMessage中获取，但是在实际代码中，ForumThread的方法getMessage委托ForumFactory来获取 ForumMessage对象。看看ForumThread的子类DbForumThread的getMessage代码：

public ForumMessage getMessage(long messageID)

throws ForumMessageNotFoundException

{

return factory.getMessage(messageID, this.id, forumID);

}

这是一种奇怪的委托，大概是因为需要考虑到过滤器功能有意为之吧。那就看看ForumFactory的具体实现子类DbForumFactory的getMessage功能，getMessage是将数据库中的ForumMessage对象经由过滤器过滤一遍后输出（注：因为原来的Jive的getMessage代码考虑到可缓存或不可缓存的过滤，比较复杂，实际过滤功能都是可以缓存的，因此精简如下）。

protected ForumMessage getMessage(long messageID, long threadID, long forumID)

throws ForumMessageNotFoundException

{

DbForumMessage message = cacheManager.messageCache.get(messageID);

// Do a security check to make sure the message comes from the thread.

if (message.threadID != threadID) {

throw new ForumMessageNotFoundException();

}

ForumMessage filterMessage = null;

try {

// 应用全局过滤器

filterMessage = filterManager.applyFilters(message);

Forum forum = getForum(forumID);

//应用本论坛过滤器

filterMessage = forum.getFilterManager().applyFilters(filterMessage);

}

catch (Exception e) { }

return filterMessage;

}

上面代码实际是装饰模式的客户端调用代码，DbForumMessage 的实例message是被油漆者decoratee。通过filterManager 或forum.getFilterManager()的applyFilter方法，将message实行了所有的过滤功能。这就类似前面示例的下列语句：

Work decorator = new Decorator(squarePeg);

forum.getFilterManager()是从数据库中获取当前配置的所有过滤器类。每个Forum都有一套自己的过滤器类，这是通过下列语句实现的：

FilterManager filterManager = new DbFilterManager();

在DbFilterManager 的类变量ForumMessageFilter [] filters中保存着所有的过滤器，applyFilters方法实行过滤如下：

public ForumMessage applyFilters(ForumMessage message) {

for (int i=0; i < filters.length; i++) {

if (filters[i] != null) {

message = filters[i].clone(message);

}

return message;

}

而ForumMessageFilter是ForumMessage的另外一个子类，被油漆者DbForumMessage通过油漆工ForumMessageFilter增加了一些新的行为和功能（过滤），如图3-6所示。

图3-6 装饰模式

这就组成了一个稍微复杂一点的装饰模式。HTMLFilter实现了HTML代码过滤功能，而JavaCodeHighLighter实现了Java代码过滤功能，HTMLFilter代码如下：

public class HTMLFilter extends ForumMessageFilter {

public ForumMessageFilter clone(ForumMessage message){

HTMLFilter filter = new HTMLFilter();

filter.message = message;

return filter;

}

public boolean isCacheable() {

return true;

}

public String getSubject() {

return StringUtils.escapeHTMLTags(message.getSubject());

}

public String getBody() {

return StringUtils.escapeHTMLTags(message.getBody());

}

HTMLFilter中重载了ForumMessage的getSubject()、getBody()方法，实际是改变了这两个原来的行为，这类似前面举例的方法：

public void insert(){

　　　　otherMethod();

　　　　work.insert();

}

这两者都改变了被油漆者的行为。

在HTMLFilter中还使用了原型（Prototype）模式，原型模式定义是：用原型实例指定创建对象的种类，并且通过复制这些原型创建新的对象。按照这种定义，Java的clone技术应该是原型模式的一个实现。

HTMLFilter的clone方法实际就是在当前HTMLFilter实例中再生成一个同样的实例。这样在处理多个并发请求时，不用通过同一个过滤器实例进行处理，提高了性能。但是HTMLFilter的clone方法是采取new方法来实现，不如直接使用 Object的native方法速度快。

因为在DbFilterManager中是根据配置使用类反射机制动态分别生成包括HTMLFilter在内的过滤器实例。但是每种过滤器实例只有一个，为了使得大量用户不必争夺一个过滤器实例来实现过滤，就采取了克隆方式，这种实战手法可以借鉴在自己的应用系统中。

2.7 主题监测与观察者模式

观察者（Observer）模式是定义对象之间一对多的依赖关系，当一个被观察的对象发生改变时，所有依赖于它的对象都会得到通知并采取相应行为。

使用观察者模式的优点是将被观察者和观察者解耦，从而可以不影响被观察者继续自己的行为动作。观察者模式适合应用于一些“事件触发”场合。

在Jive中，用户也许会对某个主题感兴趣，希望关于此主题发生的任何新的讨论能通过电子邮件通知他，因此他订阅监视了这个主题。因为这个功能的实现会引入电子邮件的发送。在前面章节已经讨论了电子邮件发送有可能因为网络原因延迟，如果在有人回复这个主题时，立即进行电子邮件发送，通知所有订阅该主题的用户。那么该用户可能等待很长时间得不到正常回应。

使用观察者模式，可以通过触发一个观察者，由观察者通过另外线程来实施邮件发送，而被观察者发出触发通知后，可以继续自己原来的逻辑行为。

看看Jive的WatchManager类：

public interface WatchManager {

//正常监察类型，用户在这个主题更新后再次访问时，会明显地发现

public static final int NORMAL_WATCH = 0;

// 当主题变化时，通过电子邮件通知用户

public static final int EMAIL_NOTIFY_WATCH = 1;

//设置一个主题被观察的时间，默认为30天

public void setDeleteDays(int deleteDays) throws UnauthorizedException;

public int getDeleteDays();

//是否激活了E-mail提醒

public boolean isEmailNotifyEnabled() throws UnauthorizedException;

public void setEmailNotifyEnabled(boolean enabled) throws UnauthorizedException;

//保存E-mail的内容

public String getEmailBody() throws UnauthorizedException;

public void setEmailBody(String body) throws UnauthorizedException;

//保存E-mail的主题

public String getEmailSubject() throws UnauthorizedException;

public void setEmailSubject(String subject) throws UnauthorizedException;

…

//为某个主题创建一个观察者

public void createWatch(User user, ForumThread thread, int watchType)

throws UnauthorizedException;

//删除某个主题的观察者

public void deleteWatch(User user, ForumThread thread, int watchType)

//得到一个主题的所有观察者

public Iterator getWatchedForumThreads(User user, int watchType)

throws UnauthorizedException;

//判断一个用户是否在观察监视该主题

public boolean isWatchedThread(User user, ForumThread thread, int watchType)

throws UnauthorizedException;

…

}

DbWatchManager是WatchManager的一个子类，通过数据库保存着有关某个主题被哪些用户监视等数据资料。WatchManager对象是随同DbForumFactory()一起生成的。

在DbWatchManager中有一个WatchManager没有的很重要的方法——通知方法：

protected void notifyWatches(ForumThread thread) {

//If watches are turned on.

if (!emailNotifyEnabled) {

return;

}

//通知所有观察这个主题的用户

EmailWatchUpdateTask task = new EmailWatchUpdateTask(this, factory, thread);

TaskEngine.addTask(task);

}

这个方法用来触发所有有关这个主题的监视或订阅用户，以E-mail发送提醒他们。那么这个通知方法本身又是如何被触发的？从功能上分析，应该是在发表新帖子时触发。

在DbForumThread的addMessage的最后一行有一句：

factory.watchManager.notifyWatches(this);

这其实是调用了DbWatchManager的notifyWatches方法，因此确实是在增加新帖子时触发了该帖子的所有观察者。

notifyWatches方法中在执行E-mail通知用户时，使用了TaskEngine来执行E- mail发送。E-mailWatchUpdateTask是一个线程类，而TaskEngine是线程任务管理器，专门按要求启动如E- mailWatchUpdateTask这样的任务线程。其实TaskEngine是一个简单的线程池，它不断通过查询Queue是否有可运行的线程，如果有就直接运行线程。

public class TaskEngine {

//任务列表

private static LinkedList taskList = null;

//工作数组

private static Thread[] workers = null;

private static Timer taskTimer = null;

private static Object lock = new Object();

static {

//根据配置文件初始化任务启动时间

taskTimer = new Timer(true);

// 默认使用7个线程来装载启动任务

workers = new Thread[7];

taskList = new LinkedList();

for (int i=0; i<workers.length; i++) {

// TaskEngineWorker是个简单的线程类

TaskEngineWorker worker = new TaskEngineWorker();

workers[i] = new Thread(worker);

workers[i].setDaemon(true);

workers[i].start(); //启动TaskEngineWorker这个线程

}

//TaskEngineWorker内部类

private static class TaskEngineWorker implements Runnable {

private boolean done = false;

public void run() {

while (!done) {

//运行nextTask方法

nextTask().run();

}

// nextTask()返回的是一个可运行线程，是任务列表Queue的一个读取者

private static Runnable nextTask() {

synchronized(lock) {

// 如果没有任务，就锁定在这里

while (taskList.isEmpty()) {

try {

lock.wait(); //等待解锁

} catch (InterruptedException ie) { }

}

//从任务列表中取出第一个任务线程

return (Runnable)taskList.removeLast();

}

public static void addTask(Runnable r) {

addTask(r, Thread.NORM_PRIORITY);

}

//这是任务列表Queue的生产者

public static void addTask(Runnable task, int priority) {

synchronized(lock) {

taskList.addFirst(task);

//提醒所有锁在lock这里的线程可以运行了

//这是线程的互相通知机制，可参考线程参考资料

lock.notifyAll();

}

…

}

在TaskEngine中启动设置了一个消息管道Queue和两个线程。一个线程是负责向Queue里放入 Object，可谓是消息的生产者；而另外一个线程负责从Queue中取出Object，如果Queue中没有Object，那它就锁定（Block）在那里，直到Queue中有Object，因为这些Object本身也是线程，因此它取出后就直接运行它们。

这个TaskEngine建立的模型非常类似JMS（Java消息系统），虽然它们功能类似，但不同的是： JMS是一个分布式消息发布机制，可以在多台服务器上运行，处理能力要强大得多。而TaskEngine由于基于线程基础，因此不能跨JVM实现。可以说 TaskEngine是一个微观组件，而JMS则是一个宏观架构系统。JMS相关讨论将在后面章节进行。

以上讨论了Jive系统中观察者模式的实现，Jive使用线程比较基础的概念实现了观察者模式，当然有助于了解J2EE很多底层的基础知识，整个Web容器的技术实现就是基于线程池原理建立的。

Java的JDK则提供了比较方便的观察者模式API——java.util.Observable和java.util.Observer，它们的用户非常简单，只要被观察者继承Observable，然后使用下列语句设置观察点：

setChanged();

notifyObservers(name); //一旦执行本代码，就触发观察者了

而观察者只要实现Observer接口，并实现update方法，在update方法中将被观察者触发后传来的object进行处理。举例如下：

网上商店中商品价格可能发生变化，如果需要在价格变化时，首页能够自动显示这些降价产品，那么使用观察者模式将方便得多。首先，商品是一个被观察者：

public class product extends Observable{

　　private float price;

　　public float getPrice(){ return price;}

　　public void setPrice(){

　　 this.price=price;

　//商品价格发生变化，触发观察者

　　 setChanged();

　　 notifyObservers(new Float(price));

　　}

　　...

}

价格观察者实现observer接口：

public class PriceObserver implements Observer{

　　private float price=0;

　　public void update(Observable obj,Object arg){

　　　　if (arg instanceof Float){

　　　　 price=((Float)arg).floatValue();

　　　　 System.out.println("PriceObserver :price changet to "+price);

　　　　}

这样，一个简单的观察者模式就很容易地实现了。

posted @ 2007-05-18 18:46 edsonjava 阅读(720) | 评论 (0) | 编辑收藏

仅列出标题下一页

导航

统计

常用链接

留言簿(10)

随笔档案

文章档案

java源码网站

操作系统

搜索

最新评论

阅读排行榜

评论排行榜

2008年3月26日

2008年2月23日

Overview

What can I do with it?

OLAP:Mondrian&JPviot

2007年7月26日

2007年7月16日

CVSNT 2.5.03 Installation on Windows 2003

Installation of the CVSNT server

Configuring the CVSNT server and repository

Adding and managing CVS users for pserver and sserver access

Administrating the repository, users with admin rights

Disabling the pserver protocol

Adding new pserver users using the cvs passwd command

Managing pserver and sserver users

Using the SSPI protocol for CVSNT access

Fine-tuning user access of CVS

Using spaces with CVSNT

Afterwords

2007年5月18日

2 Jive与设计模式

2.1 设计模式

2.2 ForumFactory与工厂模式

2.3 统一入口与单态模式

2.4 访问控制与代理模式

2.5 批量分页查询与迭代模式

2.6 过滤器与装饰模式

2.7 主题监测与观察者模式