Open Problems in Streaming

Download Report

Transcript Open Problems in Streaming

Open Problems in Streaming
Sampath Kannan
University of Pennsylvania
Outline
• Questions about the model(s)
• Relationship between streaming
model(s) and other models
• Algorithm design questions
Current Model(s)
• Input stream(s): one or more; read head can
only move to the right.
• Data order: stream is a sequence of items
permuted adversarially.
Var: Problem is about particular order in
the stream.
• Workspace: bounded; polylogarithmic?
• Approximate solutions; randomization; single
or multi pass.
Model too pessimistic?
Adversarial input ordering makes life
difficult.
Order-specific problems may not be
general enough.
What are the alternatives?
Average-case analysis?
Assume that the data in a stream is chosen
by an adversary but its ordering is random.
Open Problem: Is there a problem where
this really helps?
Conjecture: No! Simulate random order by
sampling from prefix of stream...
Prover-assisted streaming
Stream creator/prover allowed to annotate
stream (create separate stream of anntn).
Downstream verifier can use annotation to
compute... but must not be fooled by a
dishonest prover.
Element distinctness/set disjointness can be
verified with O(n) annotation [J. Zhang].
For what problems does prover-assistance
help?
Is there an example where annotation of
length o(n) makes streaming algorithm
possible?
Passes, space, approx. factor
What are the tradeoffs?
Munro and Paterson show trade-off
between space and number of passes
for exact computation of median.
Clustering: current trade-off for arbitrary
metric spaces:
O(n ) space for (21/  ) approximat ion.
Better tradeoffs?
Distributed Streams
[Gibbons & Tirthapura] study streaming
complexity where there are multiple
streams with one observer per stream.
Each observer computes a “sketch” from
which a function on the union of the streams
is computed.
If there is a canonical interleaving of streams
what functions of this canonical interleaving
can we compute?
Relation to other models
Worst-case one-round communication
complexity: Function f. Inputs partitioned
between Alice and Bob in worst possible
way. Number of bits Alice needs to
communicate for Bob to compute f is
WOCC(f).
Conj: PASS(f) = WOCC(f).
Issues: Non-uniformity in communication
complexity model: Even for the worst-case
partition, Alice and Bob could exploit the
fact that they know the partition.
Does this allow WOCC(f) < PASS (f)?
Can show: WOCC(f) < = PASS(f)
there exists f : WOCC(f) = 2
PASS(f) = 3 [Ishay,K
Strauss]
Algorithmic questions
Stream items: points from metric space
Compute diameter
(For d-dimensional
d Euclidean space
d

can approximate in O   space.)
 
(Feigenbaum, K, Zhang).
Better space by dimension reduction?
Clustering in specific metric spaces
Can we get constant factor approximation
with polylog space for Euclidean d-dimensional
space? Other specific metric spaces?
Computational geometry
If promised real convex hull has few sides
can we find approximate convex hull for a
stream of points?
Data Mining
Stream is interleaved output of one or
more Markov Chains. Many questions.
Given one or more Markov Chains and a
stream of states find max. probability
that stream generated by given chains.
Can be solved in space = product of number
of states of chains. (Batu, Guha, K)
Other questions:
Markov Chain(s) not given: Find most likely.
Assume good mix in stream.
Markov Chain(s) “hidden”... stream is not a
sequence of states but a sequence of edge
labels.
“Background” Markov Chain given. Find
motifs --- other Markov Chains that explain
“signals” in the stream.
Open Problem
Are there other open problems?