Transcript Talk Slides
Prediction Cubes
Bee-Chung Chen, Lei Chen,
Yi Lin and Raghu Ramakrishnan
University of Wisconsin - Madison
Subset Mining
We want to find interesting subsets of the
dataset
Interestingness: Defined by the “model” built on a
subset
Cube space: A combination of dimension attribute
values defines a candidate subset (just like regular
OLAP)
We want the measures to represent
decision/prediction behavior
Summarize a subset using the “model” built on it
Big change from regular OLAP!
2
The Idea
Build OLAP data cubes in which cell values
represent decision/prediction behavior
In effect, build a tree for each cell/region in the
cube—observe that this is not the same as a
collection of trees used in an ensemble method!
The idea is simple, but it leads to promising data
mining tools
Ultimate objective: Exploratory analysis of the
entire space of “data mining choices”
Choice of algorithms, data conditioning parameters …
3
Example (1/7): Regular OLAP
Z: Dimensions Y: Measure
Goal: Look for patterns of unusually
high numbers of applications:
Location
Time
# of App.
…
AL, USA
…
…
Dec, 04
…
...
2
…
WY, USA
Dec, 04
3
Location
All
Country
State
Time
All
All
Japan
USA
AL
Norway
WY
Year
Month
All
85
86
Jan., 86
04
Dec., 86
4
Example (2/7): Regular OLAP
Goal: Look for patterns of unusually
high numbers of applications:
Coarser
regions
CA
USA
…
04
03
…
100
80
…
90
90
…
…
…
…
Roll up
2004
2003
…
Jan … Dec Jan … Dec …
CA
USA
…
30
70
…
20
2
…
50
8
…
25
10
…
30
…
…
…
…
…
Drill
down
…
…
…
Cell value: Number of loan applications
Z: Dimensions Y: Measure
Location
Time
# of App.
…
AL, USA
…
…
Dec, 04
…
...
2
…
WY, USA
Dec, 04
3
CA
USA
…
AB
…
YT
AL
…
WY
…
Jan
2004
…
Dec
20
5
5
55
5
10
…
15
2
3
…
…
…
…
15
20
15
…
…
…
…
Finer regions
…
…
…
…
…
…
…
…
5
Example (3/7): Decision Analysis
Goal: Analyze a bank’s loan decision process
w.r.t. two dimensions: Location and Time
Fact table D
Z: Dimensions X: Predictors Y: Class
Location
Time
AL, USA
…
Dec, 04
…
White
…
M
…
…
…
Yes
…
WY, USA
Dec, 04
Black
F
…
No
Cube subset
Race Sex … Approval
Model h(X, Z(D))
E.g., decision tree
Location
All
Country
State
Time
All
All
Japan
USA
AL
Norway
WY
Year
Month
All
85
86
Jan., 86
04
Dec., 86
6
Example (3/7): Decision Analysis
Are there branches (and time windows)
where approvals were closely tied to
sensitive attributes (e.g., race)?
Suppose you partitioned the training data by
location and time, chose the partition for a given
branch and time window, and built a classifier.
You could then ask, “Are the predictions of this
classifier closely correlated with race?”
Are there branches and times with decision
making reminiscent of 1950s Alabama?
Requires comparison of classifiers trained using
different subsets of data.
7
Example (4/7): Prediction Cubes
2004
2003
…
Jan
…
Dec Jan
…
Dec
…
CA
0.4
0.8
0.9
0.8
…
…
USA
0.2
0.3
0.5
…
…
…
…
…
…
…
…
…
…
0.6
…
1. Build a model using data
from USA in Dec., 1985
2. Evaluate that model
Data [USA, Dec 04](D)
Location
AL ,USA
…
Time
Race
Dec, 04 White
…
…
WY, USA Dec, 04 Black
Sex
…
Approval
M
…
Y
…
…
…
F
…
N
Measure in a cell:
• Accuracy of the model
• Predictiveness of Race
measured based on that
model
• Similarity between that
model and a given model
Model h(X, [USA, Dec 04](D))
E.g., decision tree
8
Example (5/7): Model-Similarity
Given:
- Data table D
- Target model h0(X)
- Test set w/o labels
Data table D
Location
Time
AL, USA
…
Dec, 04
…
White
…
M
…
…
…
Yes
…
WY, USA
Dec, 04
Black
F
…
No
Race Sex … Approval
2004
2003
…
Jan … Dec Jan … Dec …
CA 0.4 0.2
USA 0.2 0.3
… … …
0.3
0.6
0.9
…
…
0.5
…
…
…
…
…
…
…
…
Level: [Country, Month]
Build a model
Similarity
The loan decision process in USA during Dec 04 h0(X)
was similar to a discriminatory decision model
Race
Sex
…
White
…
F
…
Yes
…
Yes
…
…
…
Black
M
No
…
Yes
Test set
9
Example (6/7): Predictiveness
Given:
- Data table D
- Attributes V
- Test set w/o labels
Data table D
Location
Time
AL, USA
…
Dec, 04
…
White
…
M
…
…
…
Yes
…
WY, USA
Dec, 04
Black
F
…
No
Race Sex … Approval
2004
2003
…
Jan … Dec Jan … Dec …
CA 0.4 0.2
USA 0.2 0.3
… … …
0.3
0.6
0.9
…
…
0.5
…
…
…
…
…
…
…
…
Level: [Country, Month]
Yes
No
.
.
Yes
h(X)
Yes
No
.
.
No
Build models
h(XV)
Predictiveness of V
Race was an important predictor of loan
approval decision in USA during Dec 04
…
Race Sex
White
…
F
…
…
…
Black
M
…
Test set
10
Example (7/7): Prediction Cube
2004
2003
…
Roll up
04
03
…
Jan
…
Dec Jan
…
Dec
…
CA
0.3
0.2
…
CA
0.4
0.1
0.3
0.6
0.8
…
…
USA
0.2
0.3
…
USA
0.7
0.4
0.3
0.3
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Cell value: Predictiveness of Race
CA
Drill down
USA
…
2004
2003
…
Jan
…
Dec
Jan
…
Dec
…
AB
0.4
0.2
0.1
0.1
0.2
…
…
…
0.1
0.1
0.3
0.3
…
…
…
YT
0.3
0.2
0.1
0.2
…
…
…
AL
0.2
0.1
0.2
…
…
…
…
…
0.3
0.1
0.1
…
…
…
WY
0.9
0.7
0.8
…
…
…
…
…
…
…
…
…
…
…
…
11
Efficient Computation
Reduce prediction cube computation to
data cube computation
Represent a data-mining model as a
distributive or algebraic (bottom-up
computable) aggregate function, so that
data-cube techniques can be directly
applied
12
Bottom-Up Data Cube
Computation
1985
1986
1987
1988
47
107
76
67
1985
1986
1987
1988
Norway
10
30
20
24
Norway
84
…
23
45
14
32
…
114
USA
14
32
42
11
USA
99
All
All
All
297
All
Cell Values: Numbers of loan applications
13
Functions on Sets
Bottom-up computable functions: Functions that can
be computed using only summary information
Distributive function: (X) = F({(X1), …, (Xn)})
X = X1 … Xn and Xi Xj =
E.g., Count(X) = Sum({Count(X1), …, Count(Xn)})
Algebraic function: (X) = F({G(X1), …, G(Xn)})
G(Xi) returns a length-fixed vector of values
E.g., Avg(X) = F({G(X1), …, G(Xn)})
G(Xi) = [Sum(Xi), Count(Xi)]
F({[s1, c1], …, [sn, cn]}) = Sum({si}) / Sum({ci})
14
Scoring Function
Represent a model as a function of sets
Conceptually, a machine-learning model h(X;
Z(D)) is a scoring function Score(y, x; Z(D))
that gives each class y a score on test
example x
h(x; Z(D)) = argmax y Score(y, x; Z(D))
Score(y, x; Z(D)) p(y | x, Z(D))
Z(D): The set of training examples (a cube
subset of D)
15
Bottom-up Score Computation
Key observations:
Observation 1: Score(y, x; Z(D)) is a function of
cube subset Z(D); if it is distributive or algebraic,
the data cube bottom-up technique can be
directly applied
Observation 2: Having the scores for all the test
examples and all the cells is sufficient to compute
a prediction cube
Scores predictions cell values
Details depend on what each cell means (i.e., type of
prediction cubes); but straightforward
16
Machine-Learning Models
Naïve Bayes:
Kernel-density-based classifier:
Scoring function: distributive
Decision tree, random forest:
Scoring function: algebraic
Neither distributive, nor algebraic
PBE: Probability-based ensemble (new)
To make any machine-learning model distributive
Approximation
17
Probability-Based Ensemble
Decision tree on [WA, 85]
PBE version of decision
tree on [WA, 85]
1985
Jan
…
W
…
A
…
…
1985
Dec
Jan
…
Dec
…
W
…
A
…
Decision trees built on
the lowest-level cells 18
Probability-Based Ensemble
Scoring function:
hPBE ( x; S (D)) arg max y ScorePBE ( y, x; S (D))
ScorePBE ( y, x; bi (D)) h( y | x; bi (D)) g (bi | x)
ScorePBE ( y, x; S (D))
iS
ScorePBE ( y, x; bi (D))
h(y | x; bi(D)): Model h’s estimation of p(y | x,
bi(D))
g(bi | x): A model that predicts the probability that
x belongs to base subset bi(D)
19
Outline
Motivating example
Definition of prediction cubes
Efficient prediction cube
materialization
Experimental results
Conclusion
20
Experiments
Quality of PBE on 8 UCI datasets
The quality of the PBE version of a model is
slightly worse (0 ~ 6%) than the quality of the
model trained directly on the whole training data.
PBE
…
W
A
…
…
1985
1985
…
…
vs.
…
W
A
…
…
Efficiency of the bottom-up score computation
technique
Case study on demographic data
21
Efficiency of Bottom-up Score
Computation
Machine-learning models:
J48: J48 decision tree
RF: Random forest
NB: Naïve Bayes
KDC: Kernel-density-based classifier
Bottom-up method vs. Exhaustive method
PBE-J48
PBE-RF
NB
KDC
J48ex
RFex
NBex
KDCex
22
Synthetic Dataset
Dimensions: Z1, Z2 and Z3.
All
All
Z1 and Z2
A
B
C
D
E
0 1 2 3 4 5 6 7 8 9
0
1
Z3
n
Decision rule:
Condition
Rule
When Z1>1
Y = I(4X1+3X2+2X3+X4+0.4X6 > 7)
else when Z3 mod 2 = 0
Y = I(2X1+2X2+3X3+3X4+0.4X6 > 7)
else
Y = I(0.1X5+X1>1)
23
Efficiency Comparison
2500
RFex
Execution Time (sec)
KDCex
2000
NBex
1500
Using exhaustive
method
J48ex
NB
1000
KDC
500
0
40K
RFPBE
J48PBE
80K
120K
160K
# of Records
Using bottom-up
score computation
200K
24
Related Work: Building models
on OLAP Results
Multi-dimensional regression [Chen, VLDB 02]
Goal: Detect changes of trends
Build linear regression models for cube cells
Step-by-step regression in stream cubes [Liu,
PAKDD 03]
Loglinear-based quasi cubes [Barbara, J. IIS 01]
Use loglinear model to approximately compress
dense regions of a data cube
NetCube [Margaritis, VLDB 01]
Build Bayes Net on the entire dataset of approximate
answer count queries
25
Related Work (Contd.)
Cubegrades [Imielinski, J. DMKD 02]
Extend cubes with ideas from association rules
How does the measure change when we rollup or
drill down?
Constrained gradients [Dong, VLDB 01]
Find pairs of similar cell characteristics associated
with big changes in measure
User-cognizant multidimensional analysis
[Sarawagi, VLDBJ 01]
Help users find the most informative unvisited
regions in a data cube using max entropy principle
Multi-Structural DBs [Fagin et al., PODS 05,
VLDB 05]
26
Take-Home Messages
Promising exploratory data analysis
paradigm:
Can use models to identify interesting subsets
Concentrate only on subsets in cube space
Those are meaningful subsets, tractable
Precompute results and provide the users with an
interactive tool
A simple way to plug “something” into cubestyle analysis:
Try to describe/approximate “something” by a
distributive or algebraic function
27
Big Picture
Why stop with decision behavior? Can apply to
other kinds of analyses too
Why stop at browsing? Can mine prediction
cubes in their own right
Exploratory analysis of mining space:
Dimension attributes can be parameters related to
algorithm, data conditioning, etc.
Tractable evaluation is a challenge:
Large number of “dimensions”, real-valued
dimension attributes, difficulties in compositional
evaluation
Active learning for experiment design, extending
compositional methods
28
Community Information
Management (CIM)
Anhai Doan
University of Illinois at Urbana-Champaign
Raghu Ramakrishnan
University of Wisconsin-Madison
UI
Structured Web-Queries
Example Queries:
UI
How many alumni are top-10 faculty members?
Wisconsin does very well, by the way
Find trends in publications
By topic, by conference, by alumni of schools
Change tracking
Alert me if my co-authors publish new papers
or move to new jobs
Information is extracted from text sources on
the web, then queried
30
Key Ideas
Communities are ideally scoped chunks of the
web for which to build enhanced portals
UI
Relative uniformity in content, interests
Can exploit “people power” via mass collaboration,
to augment extraction
CIM platform: Facilitate collaborative creation
and maintenance of community portals
Extraction management
Uncertainty, provenance, maintenance, compositional
inference for refining extracted information
Mass collaboration for extraction and integration
Watch for new DBWorld!
31
Challenges
UI
User Interaction
Declarative specification of background
knowledge and user feedback
Intelligent prompting for user input
Explanation of results
32
Challenges
UI
Extraction and Query Plans
Starting from user input (ER schema, hints) and
background knowledge (e.g., standard types,
look-up tables), compile a query into an
execution plan
Must cover extraction, storage and indexing, and
relational processing
And maintenance!
Algebra to represent such plans? Query
optimizer?
Handling uncertainty, constraints, conflicts, multiple
related sources, ranking, modular architecture
33
Challenges
UI
Managing extracted data
Mapping between extracted metadata
and source data
Uncertainty of mapping
Conflicts (in user input, background
knowledge, or from multiple sources)
Evolution over time
34