Mondrian and OLAP
Mondrian 是使用java编写的OLAP引擎,它执行MDX语言描述的查询,可以从关系数据库中获取数据。
Online Analytical Processing (OLAP)
OLAP是指实时地分析大数据量,与Online Transaction Processing (OLTP)不同。
Year
|
2000
|
2001
|
Growth
|
Product
|
Dollar sales
|
Unit sales
|
Dollar sales
|
Unit sales
|
Dollar sales
|
Unit sales
|
Total
|
$7,073
|
2,693
|
$7,636
|
3,008
|
8%
|
12%
|
Books
|
$2,753
|
824
|
$3,331
|
966
|
21%
|
17%
|
Fiction
|
$1,341
|
424
|
$1,202
|
380
|
-10%
|
-10%
|
Non-fiction
|
$1,412
|
400
|
$2,129
|
586
|
51%
|
47%
|
Magazines
|
$2,753
|
824
|
$2,426
|
766
|
-12%
|
-7%
|
— Greetings cards
|
$1,567
|
1,045
|
$1,879
|
1,276
|
20%
|
22%
|
从上表可以看到year和Product为dimensions (维度),measures 为'Unit sales' and 'Dollar sales'。
Layers of a Mondrian system
Mondrian OLAP System 包含4个层:
1、the presentation layer 用于描述最终用户所看到的,其表现形式有很多,如:pivot表、pie、line和bar图、maps和动态图。由用户提问,OLAP服务器返回答案。 2、the dimensional layer 这一层解析、验证和执行MDX 查询,首先计算轴,再计算轴上所有单元的值。metadata用于描述空间模型,及空间模型怎样映射到关系模型上。 3、the star layer 它负责维护一个聚合的cache,aggregation 聚合是一些内存中的测量值(cells),及一些维度值。如果请求的cells不在cache中,则aggregation manager 向the storage layer发出请求。 4、and the storage layer the storage layer为RDBMS。
What is MDX?
'multi-dimensional expressions'为Mondrian执行的查询。 下面为一个基本的查询: SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, {[Product].members} ON ROWS FROM [Sales] WHERE [Time].[1997].[Q2]
What is a schema?
schema定义了一个多维数据库。它包含一个逻辑模型,由多个cube、hierarchies、members及一个到物理模型的映射组成。 逻辑模型包含:cubes, dimensions, hierarchies, levels, and members.
A cube is a collection of dimensions and measures in a particular subject area.
A measure is a quantity that you are interested in measuring, for example, unit sales of a product, or cost price of inventory items.
A dimension is an attribute, or set of attributes, by which you can divide measures into sub-categories. For example, you might wish to break down product sales by their color, the gender of the customer, and the store in which the product was sold; color, gender, and store are all dimensions. 例如: <Schema> <Cube name="Sales"> <Table name="sales_fact_1997"/> <Dimension name="Gender" foreignKey="customer_id"> <Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id"> <Table name="customer"/> <Level name="Gender" column="gender" uniqueMembers="true"/> </Hierarchy> </Dimension> <Dimension name="Time" foreignKey="time_id"> <Hierarchy hasAll="false" primaryKey="time_id"> <Table name="time_by_day"/> <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/> <Level name="Quarter" column="quarter" uniqueMembers="false"/> <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/> </Hierarchy> </Dimension> <Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/> <Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/> <CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales]-[Measures].[Store Cost]"> <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/> </CalculatedMember> </Cube> </Schema>
MDX查询为:
SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, {[Time].[1997].[Q1].descendants} ON ROWS FROM [Sales]
WHERE [Gender].[F]
[Time] | [Measures].[Unit Sales] | [Measures].[Store Sales] | [1997].[Q1] | 0 | 0 | [1997].[Q1].[Jan] | 0 | 0 | [1997].[Q1].[Feb] | 0 | 0 | [1997].[Q1].[Mar] | 0 | 0 |
A member is a point within a dimension determined by a particular set of attribute values. The gender hierarchy has the two members 'M' and 'F'. 'San Francisco', 'California' and 'USA' are all members of the store hierarchy. A hierarchy is a set of members organized into a structure for convenient analysis. For example, the store hierarchy consists of the store name, city, state, and nation. The hierarchy allows you form intermediate sub-totals: the sub-total for a state is the sum of the sub-totals of all of the cities in that state, each of which is the sum of the sub-totals of the stores in that city. A level is a collection of members which have the same distance from the root of the hierarchy. A dimension is a collection of hierarchies which discriminate on the same fact table attribute (say, the day that a sale occurred).
mondrian.properties mondrian有一个配置文件可以定义它是如何运行的。
a simple star schema
Cube [Sales] has two measures [Unit sales] and [Dollar sales]
Dimension [Product] has levels [All Products] , [Manufacturer] , [Brand] , [Prodid]
Dimension [Time] has levels [All Time] , [Year] , [Quarter] , [Month] , [Day]
Dimension [Customer] has levels [All Customers] , [State] , [City] , [Custid]
Dimension [Payment Method] has levels [All Payment Methods] , [Payment Method]
|