Transcript ppt

Adaptive Ordering of Pipelined
Stream Filters
Babu, Motwani, Munagala, Nishizawa, and Widom
SIGMOD 2004 Jun 13-18, 2004
presented by
Joshua Lee
Mingzhu Wei
Some slides are adapted from the presentation slides in sigmod2004
Presentation Overview
•
•
•
•
•
•
Motivation
A-GREEDY Algorithm
INDEPENDENT Algorithm
SWEEP Algorithm
LOCALSWAPS Algorithm
Adaptive Ordering of Multiway
Joins Over Streams
• Conclusion
Motivation
• Many queries on stream data involve processing
commutative sets of filters
• Each filter is a stateless predicate that indicates
whether a tuple should be further processed
• A tuple is output only if it satisfies all filters
• Applying filters in different orders can affect the
processing time of the query
• The changing nature of stream data requires
adaptive algorithms for filter ordering
Motivation: Example
• Consider the following example usage of a
stream processing system:
– A network traffic monitoring application built
on a stream engine is used to monitor the
amount of common traffic flowing through four
routers, R1, R2, R3, and R4 over the last ten
minutes.
Motivation: Filter Ordering
• If the filters are dependent,
then the order of probing
affects processing time
• For instance, it could be
known that most traffic
through R1 comes from R3
or R4, but rarely from R2
• For incoming data on R1,
probing W2 first would drop
a tuple quickest, thus
saving the time of probing
W3 and W4
Motivation: Filter Ordering
Challenges
• Filter selectivities may be correlated
• The candidate filter orderings to consider
for N filters is N!, which increases quickly
• Attributes of stream data and arrival
characteristics can change over time
Motivation: Adaptive Filter Ordering
Challenges
• Run-time overhead incurred for continuous
monitoring of statistics
• Convergence properties required to prove an
adaptive algorithm generates an approximation
to correct answer
• Speed of adaptivity in the face of changing data
characteristics
• There is a three-way tradeoff: Algorithms that
adapt quickly and have good convergence
properties incur a lot of run-time overhead
A-GREEDY Algorithm
• The cost of a query plan ordering is
• ti is the cost of processing one tuple in
filter i
• d(j|j-1) is the conditional probability that
filter j will drop a tuple, given that no filter
from 1 to j-1 dropped it
A-GREEDY Algorithm: Static
Version and Greedy Invariant
•
Static greedy algorithm
1. Choose the filter with highest unconditional drop
probability d(i|0) as the first filter
2. Choose the filter with the highest conditional drop
probability d(j|1) as the next filter
3. Choose the filter with highest conditional drop
probability d(k|2) as the next filter
4. And so on
•
Greedy Invariant occurs when a filter ordering
satisfies the following
A-GREEDY Algorithm: Profiler
• Two logical components of A-Greedy:
– profiler: continuously collects and maintains statistics
about filter selectivities and processing costs
– Reoptimizer: detects and corrects violations of the GI
in the current filter ordering
• Challenge faced by the profiler: n2n-1 conditional
selectivities for n filters
• Solution: profile of recently dropped tuples
A-GREEDY Algorithm: Profiler
• Profile: a sliding window of profile tuples
• A profile tuple contains n boolean attributes b1,…,bn
corresponding to n filters
• Dropped tuples are sampled with some probability p
• If a tuple e is chosen for profiling, it will be tested by all
remaining filters
• A new profile tuple e: bi = 1 if Fi drop e, otherwise bi=0
A-GREEDY Algorithm: Reoptimizer
• The reoptimizer maintains an ordering O
such that O satisfies the GI
• How does the reoptimizer use profile to
derive estimates of conditional selectivities?
– It incrementally maintains a view over the
profile window
A-GREEDY Algorithm: Reoptimizer
• Matrix: View over the profile window: n×n
• V[i, j]: number of tuples in the profile
window that were dropped by Ff(j) but not
dropped by Ff (1) Ff (2),…, F f (i-1)
V[i, j] is proportional to d (j|i-1)
A-GREEDY Algorithm: Run-time
Overhead and Adaptivity
• Profile-tuple creation: needs additional n-I
evaluations for a tuple dropped by Ff(i) .
• Profile-window maintenance: insertion and deletion
of profile tuples, averages of filter processing times
• Matrix-view update: every update would cause
access to up to n2/4 entries
• Detection and correction of GI violations
• Good convergence properties
• Fast adaptivity
Variants of A Greedy
• Can we sacrifice some of A-Greedy’s
convergence properties or adaptivity
speed to reduce its run-time overhead?
• Three variants of A greedy are proposed
SWEEP Algorithm
• Proceeds in stages
• During one stage, only checks for GI violations involving the filter at one
specific position j
• Do not need to maintain entire matrix view: bf (1) ,…,b f (j)
• For each profiled tuple, needs to additionally evaluate F f (j) only
• By rotating j over 2,…,n, eventually detects and corrects all GI violations
• Same convergence properties as A-Greedy
• Reduced view and need for additional evaluations
less overhead
• slower adaptivity: only 1 filter is profiled in each stage
INDEPENDENT Algorithm
• Assumes filters are independent we only need to maintain
estimates of unconditional selectivities
• Lower view maintenance overhead
• Fast adaptivity
• Convergence: if assumption holds, optimal. Otherwise,
can be O(n) times worse than GI orderings
LOCALSWAP Algorithm
• Monitors “local” violations only, i.e., violations involving
adjacent filters
• LocalSwaps detects situations where a swap between
adjacent filters
• Only need to maintain two diagonals of the view
• For each profiled tuple dropped by Ff(i), only needs to
additionally evaluate F f(i+1)
x
x
x
x
x
x
x
Adaptive Ordering of Multiway
Joins Over Streams
• MJoins maintain an ordering of {S0, S1,…, Sn-1}-{Si} for
each stream Si
• New tuples arriving from Si is joined with other stream
windows in that order
• Two-phase join algorithm
– Drop-probing phase: the new tuple is used to probe all other
windows in the specified order. If any window drops it, no further
processing will be needed for it
– output-generation phase: if no window drops the tuple, proceeds
as conventional MJoins
• Drop probing resembles pipelined filters
• A-Greedy and its variants can be used to determine the
orderings
Conclusion
• A-Greedy has good convergence
properties and fast adaptivity, but incurs
significant run-time overhead
• Three variants of A-Greedy are proposed,
each lying at a different points along the
tradeoff spectrum among convergence,
runtime overhead, and speed of adaptivity