Probabilistic Skylines on Uncertain Data
Download
Report
Transcript Probabilistic Skylines on Uncertain Data
Probabilistic Skylines on Uncertain Data
VLDB2007
Outline
Motivation
Traditional and Probabilistic Skyline
Problem Definition
Computation Problem and Algorithms
(Top down and Bottom up)
Experimental Results
Motivation
Skyline Analysis on NBA players performance
(#Assists)
Uncertainty
Each Player has multiple records
(#Rebounds)
Motivation
Skyline Analysis on NBA players with multiple records
Motivation
Skyline Analysis on NBA players with multiple records
Easy Approach – Averaging
Arbor (x) is better in assist than Eddy, but Eddy (point b)
dominates all games of Arbor (x). “not so fair to say Eddy is a worse
in assist than Arbor”
Bob (point a) bias the aggregate value
Motivation
Motivating result using Probabilistic Skyline
Olajuwon and Kobe Bryant are missing from Aggregate
Skyline but present in Probabilistic Skyline
Their performance vary a lot over games
Details in experiment analysis
Traditional and Probabilistic Skyline
Semantics difference of Dominance between objects
Certain Data
Uncertain Data
Dominance
Certain model: an object dominate another object with Probability 1.
Uncertain model: an object dominate another object with Probability P.
Traditional and Probabilistic Skyline
Semantics difference of Dominance between objects
Certain Data
Uncertain Data
Dominance
Certain model: an object dominate another object with Probability 1.
Uncertain model: an object dominate another object with Probability P.
Traditional and Probabilistic Skyline
Semantics difference of Dominance between objects
Certain Data
Uncertain Data
Dominance
Certain model: an object dominate another object with Probability 1.
Uncertain model: an object dominate another object with Probability P.
Traditional and Probabilistic Skyline
Semantics difference of Dominance between objects
Certain Data
Uncertain Data
Dominance
Certain model: an object dominate another object with Probability 1.
Uncertain model: an object dominate another object with Probability P.
Probabilistic Skyline
Calculation of Probability Object A dominating Object C
Pr [A≺C] = 1/4*1/3 (4+..)
Probabilistic Skyline
Calculation of Probability Object A dominates Object C
Pr [A≺C] =1/4*1/3 (4+4+..)
Probabilistic Skyline
Calculation of Probability Object A dominates Object C
Pr [A≺C] = 1/4*1/3 (4+4+0)
Probabilistic Skyline
Probabilistic Skyline: From Dominance to Skyline
Intuition of finding Skyline, probability of an object not to be
dominated by other objects
0
(1/3)(1/3)
Computation Problem of p-skyline
First, each uncertain object may have many
instances. We have to process a large number of
instances.
Second, we have to consider many probabilities in
deriving the probabilistic skylines.
Algorithms (Top down and Bottom up)
Data
Multiple records of objects in the hope of
approximating the probability density function
Techniques:
Bounding
Pruning
Refining
Bottom-up Algorithm
Technique – Minimum Bounding Box (MBB)
Bottom-up Algorithm - Pruning Techniques (1/3)
using Umin, Umax to decide membership of p-skyline
For an uncertain object U and probability threshold p, if
Pr(Umin) < p, then U is not in the p-skyline. If Pr(Umax) ≥ p,
then U is in the p-skyline
Bottom-up Algorithm - Pruning Techniques (2/3)
using Umax to prune instances of objects
Let U and V be uncertain objects such that
U V . If u is an instance of U and Vmax ≺ u, then Pr(u) = 0.
C2 is dominated by Umax,
dominated by all instances in object D
Bottom-up Algorithm - Pruning Techniques (3/3)
using subset of instance to prune objects
Estimate Pr(Vmin) upper bound by Pr(Umax’)
Pr(Vmin) = (1 – |U’|/|U|)(..)(..)
If |U’| is large, more instances dominate Vmin, then Pr(Vmin) is low
Top-down Algorithm
Idea of bounding
The skyline probability of each subset of uncertain object can
be bounded using its MBB.
The skyline probability of the uncertain object can be bounded
as the weighted mean of the bounds of subsets.
Top-down Algorithm
supporting data structure : partition tree
D
B
A
C
D
B
A
C
B
A
D
C
Top-down Algorithm
partition tree for bounding
D
B
A
C
A’
B
A
D
A’
D’
C’
B’
D
C
C’
B’
C
B
A
D’
B’
A’
D’
C’
Compare the partition of U with other partition tree as follows: traverse
the partition tree of other uncertain object V, in the depth-first manner.
Top-down Algorithm
all possible situations during partition trees traversal
B D
A C
A
B
C
D
B
A
B’ D’
A’ C’
A’
D
C
B’
D’
C’
B’
A’
D’
C’
Top-down Algorithm
Pruning partition tree
B D
A C
A
B
C
D
B
A
D
C
Experiment
Other Experiment results
Scalability with respect to probability threshold
Conclusion
Multiple records
MBB