Data Mining - Lyle School of Engineering

Download Report

Transcript Data Mining - Lyle School of Engineering

CSE 5331/7331
Fall 2007
Dimensional Modeling
Margaret H. Dunham
Department of Computer Science and Engineering
Southern Methodist University
Some slides extracted from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.
CSE 5331/7331 F'07
1
Dimensional Modeling





View data in a hierarchical manner more as
business executives might
Useful in decision support systems and mining
Dimension: collection of logically related
attributes; axis for modeling data.
Facts: data stored
Ex: Dimensions – products, locations, date
Facts – quantity, unit price
CSE 5331/7331 F'07
2
Multidimensional Model Example
Fig 2 [1]
CSE 5331/7331 F'07
3
Cube view of Data
Fig 4 [1]
CSE 5331/7331 F'07
4
Aggregation Hierarchies
CSE 5331/7331 F'07
5
Multidimensional Schemas

Star Schema shows facts and dimensions
– Center of the star has facts shown in fact tables
– Outside of the facts, each diemnsion is shown
separately in dimension tables
– Access to fact table from dimension table via join
SELECT Quantity, Price
FROM Facts, Location
Where (Facts.LocationID = Location.LocationID) and
(Location.City = ‘Dallas’)
– View as relations, problem volume of data and
indexing
CSE 5331/7331 F'07
6
Star Schema
CSE 5331/7331 F'07
7
Flattened Star
CSE 5331/7331 F'07
8
Normalized Star
CSE 5331/7331 F'07
9
Snowflake Schema
CSE 5331/7331 F'07
10
OLAP Introduction
OLAP by Example
http://perso.orange.fr/bernard.lupin/englis
h/index.htm
 What is OLAP?
http://www.olapreport.com/fasmi.htm

CSE 5331/7331 F'07
11
OLAP








Online Analytic Processing (OLAP): provides more
complex queries than OLTP.
OnLine Transaction Processing (OLTP): traditional
database/transaction processing.
Dimensional data; cube view
Support ad hoc querying
Require analysis of data
Can be thought of as an extension of some of the basic
aggregation functions available in SQL
OLAP tools may be used in DSS systems
Mutlidimentional view is fundamental
CSE 5331/7331 F'07
12
OLAP Implementations

MOLAP (Multidimensional OLAP)
– Multidimential Database (MDD)
– Specialized DBMS and software system capable of
supporting the multidimensional data directly
– Data stored as an n-dimensional array (cube)
– Indexes used to speed up processing

ROLAP (Relational OLAP)
– Data stored in a relational database
– ROLAP server (middleware) creates the
multidimensional view for the user
– Less Complex; Less efficient

HOLAP (Hybrid OLAP)
– Not updated frequently – MDD
– Updated frequently - RDB
CSE 5331/7331 F'07
13
OLAP Operations
Roll Up
Drill Down
Single Cell
CSE 5331/7331 F'07
Multiple Cells
Slice
Dice
14
OLAP Operations






Simple query – single cell in the cube
Slice – Look at a subcube to get more
specific information
Dice – Rotate cube to look at another
dimension
Roll Up – Dimension Reduction; Aggregation
Drill Down
Visualization: These operations allow the
OLAP users to actually “see” results of an
operation.
CSE 5331/7331 F'07
15
Relationship Between Topcs
CSE 5331/7331 F'07
16
Decision Support Systems
Tools and computer systems that assist
management in decision making
 What if types of questions
 High level decisions
 Data warehouse – data which supports
DSS

CSE 5331/7331 F'07
17
Starflake
Fig 2 [4]
CSE 5331/7331 F'07
18
Hierarchy of Data Cubes
Fig 4 [4]
CSE 5331/7331 F'07
19
Unified Dimensional Model
Microsoft Cube View
 SQL Server 2005
http://msdn2.microsoft.com/enus/library/ms345143.aspx
http://cwebbbi.spaces.live.com/Blog/cns!1pi7ET
ChsJ1un_2s41jm9Iyg!325.entry
 MDX AS2005
http://msdn2.microsoft.com/enus/library/aa216767(SQL.80).aspx

CSE 5331/7331 F'07
20
Bibliography
[1] Anne-Muriel Arigon, Anne Tchounikine, and Maryvonne Miquel, “Handling
Multiple Points of View in a Multimedia Data Warehouse,” ACM Transactions on
Multimedia Computing, Communications and Applications, Vol. 2, No. 3, August
2006, Pages 199–218.
[2] S. Nicholson, “The Bibliomining Process: Data Warehousing and Data Mining
for Library Decision-Making,” Information Technology and Libraries, 22(4),
2003.
[3] S. Nicholson, “The Basis for Biliomining: Frameworks for Bringing Together
Usage-Based Data Mining and Bibliometrics through Data Warehousing in
Digital Library Services,” Information Processing & Management, 42(3), May
2006, pp 785-804.
[4] Jane You, Tharam Dillon, James Liu, Edwige Pissaloux, “On Hierarchical
Multimedia Information Retrieval,” You, J.; Proceedings of the 2001
International Conference on Image Processing, 7-10 Oct 2001, pp 729 – 732.
[5] Torsten Priebe and Gunther Pernul, “Ontology-based Integration of OLAP and
Information Retrieval,” Proceedings of the 14th International Workshop on
Database and expert Systems Applications, 2003.
CSE 5331/7331 F'07
21