ICDE2011_StochasticS.. - School of Computer Science and

Download Report

Transcript ICDE2011_StochasticS.. - School of Computer Science and

Stochastic Skyline Operator
Xuemin Lin
School of Computer Science
University of New South Wales
Australia
Joint Work with:
Ying Zhang (UNSW), Wenjie Zhang (UNSW),
Muhammad Aamir Cheema (UNSW)
Introduction: Skyline
2


a user preference ≺ is given on each dimension of Rd.
two points in Rd, u dominates v (u ≺ v)
  i (1 ≤ i ≤ d), u.i ≺= v.i;  j (1 ≤ j ≤ d), u.j ≺ v.j
Skyline:


Points not dominated by
another point.
Multiple criteria optimal
decision making: minimum
set of candidates of best
options regarding any
monotonic functions.
Skyline of Uncertain Objects
Probabilistic Skyline: (VLDB07, PODS09, etc)


Skyline probabilities by possible worlds.
Providing the probabilities not worse than any other objects.
Provide minimal candidate set of optimal solutions?


How to define optimal options?
How to characterize the minimum candidate set?
Expected Utility & Stochastic Order
Expected Utility Principle:

Given a set U of uncertain objects and a decreasing utility
function f, select U in U to maxmize E[f (U)].
Stochastic Order:

Given a family ℱ of utility functions, U ≺ℱ V if for each f in ℱ
E[f(U)] ≥ E [f(V)]
Decreasing Multiplicative Functions:

ℱ=
{
d
i 1
f i ( xi )} where f is nonnegative decreasing.
i
Low orthant order:
the stochastic order is defined over the family of decreasing
multiplicative functions.
Athlete
Instance 1
/probability
Instance 2
/probability
A
(1,4) / 0.5
(3,2) / 0.5
B
(2,5) / 0.5
(4,3) / 0.5
C
(5,1) / 0.01
(3,4) / 0.99
Utility function: f  f1 (h)  f 2 (t )
o f1 : nonnegative decreasing [0, )  [0,1]
o f 2 : nonnegative decreasing [0, )  [0,1]
1
1
f1 (4) f 2 (1)  f1 (2) f 2 (3)
2
2
1
1
E[ f ( B)]  f1 (5) f 2 (2)  f1 (3) f 2 (4)
2
2
E[ f ( A)] 
E[ f (C )] 
1
99
f1 (1) f 2 (5) 
f1 (4) f 2 (3)
100
100
e.g.
f1 (h)  e a (1 h )
f 2 (t )  ebt a  0 b  0 ;
f 2 (t )  ebt a  0 b  0 ;
Contributions
 Introduce a novel skyline operator: stochastic
skyline.
 Guarantee the minimal candidate set to the
optimal solutions regarding decreasing
multiplicative functions.
 NP-Completeness of computing stochastic
skyline regarding dimensionality d.
 Novel statistic base pruning techniques.
 Efficient partition base verification algorithms:
polynomial if d is fixed.
Problem Statement
Stochastic Order (lower orthant order):
Given U & V, U stochastically dominates
V (U ≺sd V) if for any x, U.cdf (x) ≥ V.cdf
(x) and exists y such that U.cdf (y) >
V.cdf (y).
U.cdf (x): probability mass of U in the
rectangular region R ((0,0,…0), x); see
the shaded region.
Stochastic Skyline: the objects in U not
stochastically dominated by any others,
called stochastic skyline.
Problem Statement: efficiently compute
stochastic skyline regarding discrete
cases.
Minimality of stochastic skyline
Stochastic skyline removes all objects not preferred by any
non-negative decreasing functions!
Framework


Phase 1: filtering.
Remove non-promising objects.
Phase 2: verification.
Test stochastic dominance between two objects.
BBS combing with a heap:


the “near” progressiveness
only need to test either U ≺sd V or V ≺sd U in
most cases (but not both).
Testing if U ≺sd V

Violation point: a point x in Rd+ is a violation point
regarding U ≺sd V if U.cdf (x) < V.cdf (x).
Testing algorithm: if no violation points, then U ≺sd V.

Not enough to test instances.

1
pu1  pu2 
2
1
pv1  pv2  pv3 
3
Reduce to Grid Points
Test if U.cdf ≥ V.cdf against grid points only (see (a)).
Testing the switching grid points only (see solid lines (b)).
Algorithm

Given a rectangular region R (x, y), if U.cdf (x) ≥ V.cdf (y), then no
violation point in R (x, y).
Partition base testing algorithm:

Get switching points

Initial check

Iteratively partition the grid to throw away non-promising sub-grids
Complexity
The algorithm runs O (dm log m + md (T (Uartree) + T (Vartree))) where m
is the number of instances in V.
NP-Complete regarding d.
Covert (the decision version of) the minimal set cover problem to a
special case of the testing problem.
Filtering Techniques
Pruning Rule 1: throw away fully dominated entries.
Filtering Techniques
Pruning Rules 2: applying Cantelli’s Inequality to get upper-bonds.
Size Estimation:
Expected size: size of stochastic skyline in Rd is
bounded by that of conventional skyline in Rd+1;
i.e.,
lnd (n)/(d+1)!
Empirical Study



C++ with STL compiled with GNU GCC on 2.4GHz Debian
Real data set: NBA player’s game-by-game statistics
Synthetic dataset: anti-correlated, correlated, independent
Summary




a novel skyline operator: stochastic skyline
guarantee minimality .
NP-complete to test stochastic order (lower orthant order) .
novel efficient algorithms to compute stochastic order.
Future work:
 F is a set of all decreasing functions?
Thank you!