Transcript sigmod02

Cube Explorer:
Online Exploration of Data Cubes
Jiawei Han, Jianyong Wang, Guozhu Dong,
Jian Pei, Ke Wang
Mining Guided Cube Explorer
Novel Algorithms and Methods:
 Faster Creation of Iceberg Cube
 Predictive Gradient Analysis
 Multi-dimensional Gradient Mining
 Association and Sequence Cube Analysis
Integrated with commercial software such as Microsoft
OLE DB for DM, OLAP, RDMS, and DBMiner
2
Cube Explorer Benefits
For Users:
 Superior performance and scalability
 Saving analysis cost and time –reusable mining
queries, work directly on OLAP and relational data,
 Easy to use – SQL like mining, integrated with data
sources
 Leverage OLAP & data warehouse engines–
versatile functionality and strong synergy
3
Iceberg Cube Exploration Demo
 Novel
H-Tree Iceberg Cube Creation
 Cube Computation with Complex Measures
Dataset: large retail POS transaction data
4
Iceberg Cube Exploration Results

3D Visualization (Scatter Plot)
5
Gradient Mining Issues
1: “What products sold with ‘TV’ will significantly
change profits of ‘TV’ ?”
Answer:
-TV profit is up 10% when sold with DVD
-TV profit is down 5% when sold with VCR
2: “What are changes of housing price in Big City in
2001 comparing against 2000?”
Answer:
-downtown apartments go up 15% while houses in
suburb go down 5%
6
How to Mine Meaningful Changes?
1 Naïve and manual method

Compute two sub-cubes
Big City housing in 2000
 Big City housing in 2001


Tremendous costs
Space
 Time

2 Innovation
Only interesting changes wanted “gradient constraint” to
capture and predict significant changes automatically
7
Gradient Mining Prediction Demo

What products sold with ‘Muffins’ will change Sales
of ‘Muffins’?
Select ‘Muffins’ as promotion Itemset, Sales average as Measure:
8
Gradient Mining: Results 1

Most profitable patterns (Ratio >1)
Rule #1: cereal increases ‘muffins” avg. sales by 8%
9
Gradient Mining: Results 2

Least profitable patterns (Ratio <1)
Rule #1: Ice Cream reduces ‘muffins” avg. sales by 4%
10
Gradient Mining: Visualization

Results plotted using 3D bar graph
11