Transcript ppt

Anti-Persistence
or
History Independent Data Structures
Moni Naor
Weizmann Institute
Vanessa Teague
Stanford
Why hide your history?


Core dumps
Losing your laptop
–

Emailing files
–

The entire memory representation
of data structures is exposed
The editing history may
be exposed (e.g. Word)
Maintaining lists of people
–
Sports teams, party invitees
Making sure that nobody learns from
history

A data structure has:
–
–

A “legitimate” interface: the set of operations
allowed to be performed on it
A memory representation
The memory representation should reveal no
information that cannot be obtained from the
“legitimate” interface
History of history independence
Issue dealt with in Cryptographic and Data Structures
communities

Micciancio (1997): history independent trees
–
–
–

Motivation: incremental crypto
Based on the “shape” of the data structure, not
including memory representation
Stronger performance model!
Uniquely represented data structures
–
–
Treaps (Seidel & Aragon), uniquely represented
dictionaries
Ordered hash tables (Amble & Knuth 1974)
More History

Persistent Data Structures: possible to
reconstruct all previous states of the data
structure (Sarnak and Tarjan)
–

We want the opposite: anti-persistence
Oblivious RAM (Goldreich and Ostrovsky)
Overview



Definitions
History independent open addressing hashing
History independent dynamic perfect hashing
–


Memory Management
(Union Find)
Open problems
Precise Definitions

A data structure is
–
–
history independent: if any two sequences of
operations S1 and S2 that yield the same content
induce the same probability distribution on
memory representation
strongly history independent: if given any two
sets of breakpoints along S1 and S2 s.t.
corresponding points have identical contents, S1
and S2 induce the same probability distributions on
memory representation at those points
Relaxations


Statistical closeness
Computational indistinguishability
–


Example where helpful: erasing
Allow some information to be leaked
– Total number of operations
– n-history independent: identical distributions if the
last n operations where identical as well
Under-defined data structures: same query can yield
several legitimate answers,
–
–
e.g. approximate priority queue
Define identical content: no suffix T such that set of permitted
results returned by S1  T is different from the one returned by
S2  T
History independence is easy (sort of)



If it is possible to decide the (lexicographically)
“first” sequence of operations that produce a
certain contents, just store the result of that
This gives a history independent version of a
huge class of data structures
Efficiency is the problem…
Dictionaries




Operations are insert(x), lookup(x) and
possibly delete(x)
The content of a dictionary is the set of
elements currently inserted (those that have
been inserted but not deleted)
Elements x  U some universe
Size of table/memory N
Goal


Find a history independent implementation of
dictionaries with good provable performance.
Develop general techniques for history
independence
Approaches

Unique representation
–
–

e.g. array in sorted order
Yields strong history independence
Secret randomness
–
–
e.g. array in random order
only history independence
Open addressing: traditional version

Each element x has a probe sequence
h1(x), h2(x), h3(x), ...
–
–
–

Element is inserted into the first free space in its
probe sequence
–

Linear probing: h2(x) = h1(x)+1, h3(x) = h1(x)+2, ...
Double hashing
Uniform hashing
Search ends unsuccessfully at a free space
Efficient space utilization
–
Almost all the table can be full
Open addressing: traditional version
Not history independent because
later-inserted elements move further
along in their probe sequence
y
x
x arrived before y, so move y
y
y
No clash, so insert y
History independent version




At each cell i, decide elements’ priorities
independently of insertion order
Call the priority function pi(x,y).
If there is a clash, move the element of lower
priority
At each cell, priorities must form a total order
Insertion
y
xy
x
x
p2(x,y)? No, so move x
Search


Same as in the traditional algorithm
In unsuccessful search, can quit as soon as
you find a lower-priority element
No deletions

Problematic in open addressing anyway
Strong history independence
Claim:
For all hash functions and priority functions, the
final configuration of the table is independent of
the order of insertion.
Conclusion:
Strongly history independent
Proof of history independence
A static insertion algorithm (clearly history independent):
Gather up the rejects and
restart
x2 x1
x2
p1(x2,x1) so insert x2
x6x5x4
x45
pinsert
3(x4,xx5) and p3(x4,x6).
Insert x54 and remove x5
x1
x4 x
x1
x1 x3
x6
x4 x5
x2
x3
6
x3
p1(x6,x4) and p6(x3,x6),
so insert x3
Proof of history independence

Nothing moves further in the static algorithm
than in the dynamic one
–

Vice versa
–

By induction on rounds of the static alg.
By induction on the steps in the dynamic alg.
Strongly history independent
Some priority functions

Global
–

Random
–

A single priority independent of cell
Choose a random order at each cell
Youth-rules
–
Call an element “younger” if it has moved less far
along its probe sequence; younger elements get
higher priority
Youth-rules
y
p2(x,y) because x has taken fewer steps than y
x
y
y
x
Use a tie-breaker if steps are equal
This is a priority function
Specifying a scheme

Priority rule
–
–

Choice of priority functions
In Youth-rules – determined by probe sequence
Probe functions
–
–
–
How are they chosen
Maintained
Computed
Implementing Youth-rules

Let each hi be chosen from a pair-wise
independent collection
–
For any two x and y the r.v. hi(x) and hi(y) are
uniform and independent.
Let h1, h2, h3,… be chosen independently
 Example: hi(x) = (ai ·x mod U) + bi mod N
 Space: 2 elements per function

Need only log N functions
Performance Analysis


Based on worst-case insertion sequence
The important parameter:  - the fraction of
the table that is used  ·N elements
 Analysis of expected insertion time and
search time (number of probes to the
table)
–
Have to distinguish successful and
unsuccessful search
Analysis via the Static Algorithm

For insertions, the total number of probes in
static and dynamic algorithm are identical
–

Easier to analyze the static algorithm
Key point for Youth-rules: in the phase i all
unsettled elements are in the ith probe in their
sequence
– Assures fresh randomness of hi (x)
Performance
For Youth-rules, implemented as specified:
 For any sequence of insertion the expected
probe-time for insertion is at most 1/(1-)
 For any sequence of insertion the expected
probe-time for successful or unsuccessful
search is at most 1/(1-)

Analysis based on static algorithm
 is the fraction of the table that is used
Comparison to double hashing



Analysis of double hashing with truly random
functions [Guibas & Szemeredi, Lueker &
Molodowitch]
Can be replaced by log n wise independent
functions (Schmidt & Siegel)
log n wise independent is relatively expensive:
–
either a lot of space or log n time
Youth-rules is a simple and provably efficient
scheme with very little extra storage
Extra benefit of considering history independence
Other Priority Functions

[Amble & Knuth] log(1/(1-)) for global
–

Truly random hash functions
Experiments show about log(1/(1-)) for
most priority functions tried
Other types of data structures

Memory management (dealing with pointers)
–

Memory Allocation
Other state-related issues
Dynamic perfect hashing:
FKS scheme, dynamized
n elements to
be inserted
x1 x3
x6
x5
x4
x2
Top-level table:
O(n) space
h0
s0
h1
h
s1
sk
Low-level tables:
O(n) space total.
Each gets about si2
hk
The hi are perfect on their respective sets.
Rechoose h or some hi to maintain perfection and linear space.
A subtle problem:
the intersection bias problem

Suppose we have:
–
–
–


Keep a “current” h as states change
Change h only if it is no longer “good”.
–

a set of states {1, 2, ...}
a set of objects {h1, h2, ...}
a way to decide whether hi is “good” for j.
Choose uniformly from the “good” ones for .
Then this is not history independent
–
h is biased towards the intersection of those good
for current  and for previous states.
Dynamized FKS is not history independent



Does not erase upon deletion
Uses history-dependent memory allocation
Hash functions (h, h1, h2, ...) are changed
whenever they cease to be “good”
–
–
Hence they suffer from the intersection bias
problem, since they are biased towards functions
that were “good” for previous sets of elements
Hence they leak information about past sets of
elements
Making it history independent




Use history independent memory
allocation
Upon deletion, erase the element and
rechoose the appropriate hi. This solves the
low-level intersection bias problem.
Some other minor changes
Solve the top-level intersection bias
problem...
Solving the top-level intersection bias
problem


Can’t afford a top-level rehash on every
deletion
Generate two “potential h”s 1 and 2 at the
beginning
–
–
–
Always use the first “good” one
If neither are good, rehash at every deletion
If not using 1, keep a top-level table for it for easy
“goodness” checking (likewise for 2)
Proof of history independence
Table’s state is defined by:
 The current set of elements
 Top-level hash functions
–

Low-level hash functions
–

Uniformly chosen from perfect functions
Arrangement of sub-tables in memory
–

Always the first “good” i, or rechosen each step
Use history-independent memory allocation
Some other history independent things
Performance


Lookup takes two steps
Insertion and deletion take expected amortized
O(1) time
–
There is a 1/poly chance that they will take more
Open Problems


Better analysis for youth-rules as well as other
priority functions with no random oracles.
Efficient memory allocation
–

ours is O(s log s)
Separations
–
–
Between strong and weak history independence
Between history independent and traditional
versions


e.g. for union find
Can persistence and (computational) history
independence co-exist efficiently?
Conclusion


History independence can be subtle
We have two history independent hash tables:
–
Based on open addressing

–
Very space efficient but no deletion
Dynamic perfect hashing

Allows deletion, constant-time lookup
Open addressing: implementing hash
functions


For all i, generate random independent ai, bi
hi(x) = (aix mod U + bi) mod N
–
–


U : size of universe; prime
N : size of hash table
x’s probe sequence is h1(x), h2(x), h3(x), ...
We need log n hash functions
–
n is the number of elements in the table