ICDE2011_StochasticS.. - School of Computer Science and
Download
Report
Transcript ICDE2011_StochasticS.. - School of Computer Science and
Stochastic Skyline Operator
Xuemin Lin
School of Computer Science
University of New South Wales
Australia
Joint Work with:
Ying Zhang (UNSW), Wenjie Zhang (UNSW),
Muhammad Aamir Cheema (UNSW)
Introduction: Skyline
2
a user preference ≺ is given on each dimension of Rd.
two points in Rd, u dominates v (u ≺ v)
i (1 ≤ i ≤ d), u.i ≺= v.i; j (1 ≤ j ≤ d), u.j ≺ v.j
Skyline:
Points not dominated by
another point.
Multiple criteria optimal
decision making: minimum
set of candidates of best
options regarding any
monotonic functions.
Skyline of Uncertain Objects
Probabilistic Skyline: (VLDB07, PODS09, etc)
Skyline probabilities by possible worlds.
Providing the probabilities not worse than any other objects.
Provide minimal candidate set of optimal solutions?
How to define optimal options?
How to characterize the minimum candidate set?
Expected Utility & Stochastic Order
Expected Utility Principle:
Given a set U of uncertain objects and a decreasing utility
function f, select U in U to maxmize E[f (U)].
Stochastic Order:
Given a family ℱ of utility functions, U ≺ℱ V if for each f in ℱ
E[f(U)] ≥ E [f(V)]
Decreasing Multiplicative Functions:
ℱ=
{
d
i 1
f i ( xi )} where f is nonnegative decreasing.
i
Low orthant order:
the stochastic order is defined over the family of decreasing
multiplicative functions.
Athlete
Instance 1
/probability
Instance 2
/probability
A
(1,4) / 0.5
(3,2) / 0.5
B
(2,5) / 0.5
(4,3) / 0.5
C
(5,1) / 0.01
(3,4) / 0.99
Utility function: f f1 (h) f 2 (t )
o f1 : nonnegative decreasing [0, ) [0,1]
o f 2 : nonnegative decreasing [0, ) [0,1]
1
1
f1 (4) f 2 (1) f1 (2) f 2 (3)
2
2
1
1
E[ f ( B)] f1 (5) f 2 (2) f1 (3) f 2 (4)
2
2
E[ f ( A)]
E[ f (C )]
1
99
f1 (1) f 2 (5)
f1 (4) f 2 (3)
100
100
e.g.
f1 (h) e a (1 h )
f 2 (t ) ebt a 0 b 0 ;
f 2 (t ) ebt a 0 b 0 ;
Contributions
Introduce a novel skyline operator: stochastic
skyline.
Guarantee the minimal candidate set to the
optimal solutions regarding decreasing
multiplicative functions.
NP-Completeness of computing stochastic
skyline regarding dimensionality d.
Novel statistic base pruning techniques.
Efficient partition base verification algorithms:
polynomial if d is fixed.
Problem Statement
Stochastic Order (lower orthant order):
Given U & V, U stochastically dominates
V (U ≺sd V) if for any x, U.cdf (x) ≥ V.cdf
(x) and exists y such that U.cdf (y) >
V.cdf (y).
U.cdf (x): probability mass of U in the
rectangular region R ((0,0,…0), x); see
the shaded region.
Stochastic Skyline: the objects in U not
stochastically dominated by any others,
called stochastic skyline.
Problem Statement: efficiently compute
stochastic skyline regarding discrete
cases.
Minimality of stochastic skyline
Stochastic skyline removes all objects not preferred by any
non-negative decreasing functions!
Framework
Phase 1: filtering.
Remove non-promising objects.
Phase 2: verification.
Test stochastic dominance between two objects.
BBS combing with a heap:
the “near” progressiveness
only need to test either U ≺sd V or V ≺sd U in
most cases (but not both).
Testing if U ≺sd V
Violation point: a point x in Rd+ is a violation point
regarding U ≺sd V if U.cdf (x) < V.cdf (x).
Testing algorithm: if no violation points, then U ≺sd V.
Not enough to test instances.
1
pu1 pu2
2
1
pv1 pv2 pv3
3
Reduce to Grid Points
Test if U.cdf ≥ V.cdf against grid points only (see (a)).
Testing the switching grid points only (see solid lines (b)).
Algorithm
Given a rectangular region R (x, y), if U.cdf (x) ≥ V.cdf (y), then no
violation point in R (x, y).
Partition base testing algorithm:
Get switching points
Initial check
Iteratively partition the grid to throw away non-promising sub-grids
Complexity
The algorithm runs O (dm log m + md (T (Uartree) + T (Vartree))) where m
is the number of instances in V.
NP-Complete regarding d.
Covert (the decision version of) the minimal set cover problem to a
special case of the testing problem.
Filtering Techniques
Pruning Rule 1: throw away fully dominated entries.
Filtering Techniques
Pruning Rules 2: applying Cantelli’s Inequality to get upper-bonds.
Size Estimation:
Expected size: size of stochastic skyline in Rd is
bounded by that of conventional skyline in Rd+1;
i.e.,
lnd (n)/(d+1)!
Empirical Study
C++ with STL compiled with GNU GCC on 2.4GHz Debian
Real data set: NBA player’s game-by-game statistics
Synthetic dataset: anti-correlated, correlated, independent
Summary
a novel skyline operator: stochastic skyline
guarantee minimality .
NP-complete to test stochastic order (lower orthant order) .
novel efficient algorithms to compute stochastic order.
Future work:
F is a set of all decreasing functions?
Thank you!