an approach to allow database outsourcing
Download
Report
Transcript an approach to allow database outsourcing
StrangerDB -Safe Data Management
with Untrusted Servers
Dennis Shasha ([email protected])
Joint work (past and present):
David Mazieres and Radu Sion
1
Goals
• Store private data in a public database:
backup, concurrency control, and query
processing
• Make unauthorized modifications evident
(safety)
• Protect data from being observed (privacy)
• Force server to deliver a consistent picture
to all honest users or be discovered
(consistency/serializability).
2
Methods
• Signatures for tamper-evidence
Clients/users share encryption for privacy
among themselves
• SUNDR-style [1] maintenance to detect
inconsistent transaction orders.
• Checksums for query efficiency.
1. "Building secure file systems out of Byzantine
storage", David Mazieres and Dennis Shasha, Principles
of Distributed Computing, 2002. pp. 108-117.
3
Tamper Evidence (safety)
-- first possibility
• Every modifier signs a collision-resistant
hash of the database after modification.
• Note: Need public signature keys provided
as certificates from a trusted certificate
authority.
• Simple model: one signature for whole
database. Can achieve much finer
granularity as we see later.
4
Collision-Resistant Hash Setup:
Inspired by Merkle Trees
sgn_user(HASH (root), ptr)
ENCRYPTED DATA BLOB 1
HASH1, ptrs
ENCRYPTED DATA BLOB 2
HASH2, ptrs
….
ENCRYPTED DATA BLOB n
5
Privacy – for now
• Every blob is encrypted using private key
encryption. A blob can be data or an index
page having pointers and hashes like the
root.
• Descending the Merkle tree requires
decryption as one goes.
6
Assumption: Clients are Good
• Note that if a client/user is malicious, that client
can download or modify any information to
which it has decryption keys and has access. It
could share this information with the server.
• So, if a client is not trustworthy, we can
guarantee nothing about data it can access.
• So, assume for now clients do not modify data
maliciously or reveal data to the server.
• Server could be bad.
7
Malicious servers and forks
• If a user u accesses a database at time t,
the user wants to be sure that the
database is current as of time t.
• A malicious server might give u a
database state reflecting only some
previous updates and v another state.
• Users cannot prevent such “forking
attacks” but would like to discover them
quickly.
8
Underlying Strategy to
ensure inter-user consistency
• Periodically, every pair of users exchange
their ideas of global history. If one member
of the pair has missed an update done by
the other, the histories won’t be consistent.
• For this to work in practice, we need some
encoding of that global history.
9
Strawman Implementation:
Global Log of Operations
• Imagine that we have a sequential log
consisting of every transaction that ever hit
the database and that this sequential log is
signed by the last transaction.
• Ex: order of transactions is T1 T2 T3 (done
by users u1 u2 u3).
Log after T3:
• (((((T1) sgn_u1) T2) sgn_u2) T3) sgn_u3
10
Ensuring Individual Consistency
• Log is held by the untrusted server.
• Every time a user appends to and signs the log,
the user first checks that the log he/she
previously signed is a prefix of log to be signed.
• Ex: if u is about to commit transaction T’ and has
previously committed T, then u makes sure that
the log now contains as a prefix the log from the
time T was committed.
11
Individual Log Consistency
Bob: “In my previous update, the log had (in left
to right order): Talice1, Tbob1
Now it has Talice1, Tbob1, Tmary1.
So my previous view of the log was a prefix of the
current one. Ok, so I’ll append my new
transaction: Talice1, Tbob1, Tmary1, Tbob2
Alice hears nothing of all this.
12
Ensuring Global Consistency
by detecting forking attacks
• Periodically, users exchange their views of the
log of all transactions. Each user verifies:
– The signatures of all the users
– Whether one global history is a prefix of the other or
not.
• If there has been a forking attack and u1 has not
seen a transaction of u2, then neither user’s log
will be a prefix of the other.
13
Global Log Consistency
Bob: “Alice, here is the log of all transactions as I
see it: Talice1, Tbob1, Tmary1, Tbob2
Alice: “That’s funny. Here is my log
Talice1, Tbob1, Tjill1. It is not a prefix of yours
because I have Jill’s transaction, but yours is not a
prefix of mine because you have Mary’s
transaction.”
Bob: “Are all the signatures
good?”
Alice: “Absolutely. See for
yourself. Server is being naughty
again.”
14
Semantic Objection
• Server might fail to update the data but assert
that the transactions are executed in the same
global order.
• Fix: associate with each transaction a collisionresistant hash of the state of the whole database
(root of Merkle). Call these h1, h2, h3
• (((((T1 h1) sgn_u1) T2 h2) sgn_u2) T3 h3) sgn_u3
• Transactions verify global hash upon data
access.
15
Practical objection: verification
time/space grows with transactions
• In this log-based (strawman)
implementation, each user must sign log
of all transactions since beginning of
history (last database dump) and must
exchange them.
• Alternative is to have each user update
his/her version for every transaction (even
read-only transactions).
17
Version Structure Detail
• Suppose user u creates the last version
structure. Then u increments his/her
version number (and no other) and signs
the structure, which contains:
sgn_u(hash_u, (u1, n1), (u2, n2), …)
where (ui, ni) means: ui is at version ni
and hash_u is the hash of the root of the
Merkle after u is done.
18
Three incrementally related
version structures
Bob: sgn_Bob(hash_Bob, (Bob,
6), (Alice, 12), (Bill, 4))
Alice: sgn_Alice(hash_Alice,
(Bob,6), (Alice,13), (Bill,4))
Bob: sgn_Bob(hash_Bob,
(Bob,7), (Alice, 13), (Bill,4))
20
Ordering Properties of version
structures
• Define a partial order on version structures:
vs1 < vs2 if the users in vs1 are a subset of the
users in vs2 and for every user u in vs1, the
version of u in vs1 (denoted vs1[u]) is less than
or equal to vs2[u] and for at least one user v,
vs1[v] < vs2[v].
• We say vs1 is “incrementally less than” vs2 if there
exists a u such that u signs vs2,
vs2[u] = vs1[u] + 1
and for all v, if v != u then vs2[v] = vs1[v].
21
Forking creates incomparable
version structures
Bob: sgn_Bob(hash_Bob), (Bob,
6), (Alice, 12), (Bill, 4))
Alice: sgn_Alice(hash_Alice,
(Bob,6), (Alice,13), (Bill,4))
Server forks and doesn’t show Bob this.
Bob: sgn_Bob(hash_Bob, (Bob,7),
(Alice, 12), (Bill,4))
Now, Bob and Alice’s last version
structures are incomparable, i.e.
unordered by <.
22
Once Incompatible,
No Turning Back
• Once Alice produces:
Alice: sgn_Alice(hash_Alice,
(Bob,6), (Alice,13), (Bill,4))
• And Bob produces:
Bob: sgn_Bob(hash_Bob, (Bob,7),
(Alice, 12), (Bill,4))
• if server shows Bob any of Alice’s updates
after Alice, 12, Bob will see
incomparability. Symmetric for Alice.
23
Net Effect:
Fork Leaves Indelible Mark
• If server forks Bob and Alice even once,
then server can never show either the
other’s updates.
• Human analogy: to catch a thief, put
indelible ink on dollar bills.
24
Signing Verification Protocol
• When a user u is ready to sign the version
structure vs_u constructed as above, u asks for
all version structures since u’s last signed
version structure.
• User u verifies that they are incrementally
related to one another.
• User u then enters its new version structure.
26
Detecting forks:
Global Version Exchange Protocol
Bob: “Alice, from time to time, a global version exchange
protocol begins. Let’s say an instance of the protocol starts
at time t. Every user sends its latest version structure (a
relatively small structure) preceding t to all other users.
Sending is done without mediation by server.”
Alice: “Then what?”
Bob: “Each user checks that the
version structures are ordered by <.”
Alice: “What if some user does not
send?”
Bob: “No problem. Validate the
ones that do send.”
29
Key database assurance:
Version Structures and
Serializability
• Serializability will be based on version
structure order.
• That is, transactions will serialize in the <
order of version structures.
• Obvious now that there is only one root of
the Merkle, but could generalize to having
several roots, each locked separately.
30
Serializability Lemma
• Lemma: If T1 T2 (conflict edge from T1 to T2),
vs1 is the version structure signed by user u1 for
T1, vs2 is the version structure signed by user
u2 for T2, all version structures among some set
of virtuous users U including u1 and u2 are
ordered, then vs1 < vs2.
• Proof Idea: Suppose user u1 issues T1 and user
u2 issues T2. For any conflict, there is some
data item x such that op1(x) precedes op2(x).
Will be reflected in Merkle…
31
Issue: Server could be framed
Bob (good): sgn_Bob(hash(Bobdata),
(Bob, 6), (Alice, 12), (Bill, 4))
Alice (good): sgn_Alice(hash(Alicedata),
(Bob,6), (Alice,13), (Bill,4))
Server shows this to Bob. No fork.
Bob (bad): sgn_Bob(hash(Bobdata), (Bob,7),
(Alice, 12), (Bill,4))
Bob pretends server has forked.
Upon global exchange, server looks guilty.
41
Server can avoid being framed
• Server signs the version structures from
users if it agrees they are legitimate. In the
case of previous figure, server will refuse
to sign Bob’s second version structure.
• Bob and the server can present their
evidence before security officer.
42
Server Proves Innocence
Bob (good):
sgn_server(sgn_Bob(hash(Bobdata),
(Bob, 6), (Alice, 12), (Bill, 4)))
Alice (good): sgn_server(
sgn_Alice(hash(Alicedata),
(Bob,6), (Alice,13), (Bill,4)))
Server shows this to Bob. No fork.
Bob (bad): sgn_Bob(hash(Bobdata), (Bob,7),
(Alice, 12), (Bill,4))
Server refuses to sign. Shows innocence.
43
Getting More from the Server
• At this point the server gives us blocks of
data. This means the client in fact does all
the database work.
• Is there any way the server could actually
execute basic sql queries but we could
verify either at the same time or later that
the server has not lied?
44
New Privacy Setup I
• A record is a sequence of cleartext field
values plus an encrypted part that may
encompass one or more fields and a
nonce (the record’s “encrypted blob”).
• Nonce ensures that if several rows share
the same encrypted blob, the server won’t
know.
• Encryption remains private key encryption.
45
Now Server Can Use Indexes
• Consider R(A, B, C, D, E, F), where the
first three fields are in clear text and last
three constitute the encrypted blob of each
record.
• Selections on first three fields can use
indexes, followed by decrypting and
scanning last three fields.
46
Sweeney Attack
• Voter registration gives: town, gender, zip,
birth date. Enough to identify Governor
William Weld in Massachusetts.
• Health records contain: zip code, birth
date, gender, ethnicity and diagnosis.
• Hence possible to infer health status of
governor.
•
L. Sweeney, Uniqueness of Simple Demographics in the U.S. Population,
LIDAPWP4. Carnegie Mellon University, Laboratory for International Data
Privacy, Pittsburgh, PA: 2000
47
Privacy Setup Revised:
avoid associations
• Sometimes, A, B, C individually may not
be secret but their association is secret. In
that case, could split R into three tables:
Ra(A, encrypt(B,C,D,E,F, nonce))
Rb(B, encrypt(A,C,D,E,F, nonce))
Rc(C, encrypt(A,B,D,E,F, nonce))
• Any selection on multiple fields will use
only one of Ra, Rb, Rc
• Avoid “Sweeney attack”.
48
New Privacy Guarantees
• Reveal cleartext fields to server but no
dangerous associations.
• Never reveal encrypted blob because
nonce hides duplicates.
• Extra-database associations, e.g. the
phone call a database user makes after
accessing the database, still possible, but
we don’t handle.
49
New, More Efficient, Approach
• Version structure still used to prevent
forking attack.
• Discard Merkle tree.
• Introduce a checksum data structure for
one or more attributes to detect tampering.
52
Checksum data structure
for some attribute A
• Partition the domain of cleartext field A.
• Associate with each partition (e.g. A
between v1 and v2) the rows having those
A values
(e.g. R_v1,v2 = {r | v1 <= r.A <= v2}
• Compute the checksum of R_v1,v2
• Do this for every partition.
53
Properties of the checksum
• Checksum should be an associative and
commutative function of the rows.
• Example: Use a one-way hash function of
each row and then the checksum is the
product of the hash values modulo a large
prime.
• M. Bellare and D. Micciancio. A new paradigm for collision-free
hashing: Incrementality at reduced cost. In Proceedings of
EuroCrypt, 1997.
54
Maintaining the checksum
data structure
• If a row r is inserted into some partition p having
checksum value v mod m, then compute the
one-way hash of r, H(r) and the new checksum
value becomes (H(r) * v) mod m.
• If r were deleted, then the new checksum value
becomes (v/H(r)) mod m
• Updates are treated as deletes followed by
inserts.
55
New Approach to Modification
• A transaction that does one or more
modifications simply records updates to
each partition (e.g. add x to partition p1,
delete y from p2, etc.).
• The transaction signs all of the above and
puts in an “operation log” stored on server
along with version structures.
• Sgn_u(insert/deletes, query text, version
structure)
56
Client Interaction with
Operation Log
Bob: Operation log will record my
changes to checksums and the inserts
and deletes I do. Associate a version
structure entry with an op log entry.
Alice: But do you write a new checksum
value?
Bob: No. I write changes only.
New checksums can be calculated later.
57
Query Processing
• Consider a simple selection query:
select * from R where A = 15 and B> 5
• Client sends that query to the server and
the server returns some rows based on A.
Client decrypts and selects based on B
values.
• Normally, client doesn’t check any further.
• Vast improvement in performance – server
is doing most of the database work.
58
Client verifies a query immediately
after server responds
Client asks the server for the rows in R from the
partition containing R.A = 15 and verifies: (i)
client received the correct subset of those rows;
(ii) those rows are consistent with the checksum
for that partition.
• Client may have to compute checksum from the
changes to this partition’s checksum in the
operation log. Client updates checksum in
operation log.
59
Verifying a Query
– At the moment
Client: Ok, server, you have just sent me a result, but I
want to check up on you. So, please send me: (1) all
rows in the partition including this value; (2) all of the
updates to the checksum for this partition since the
last recalculation of the checksum value
Server: why all of them?
Client: So I can verify, based on version structures,
that you are sending me all the modifications. I will
then recalculate the checksum for this partition.
60
Verifying a Query Q
-- well after the fact
• Client obtains rows of the partition that
contains the query response as of now.
Verifies against the current checksum.
• Client next obtains all modifications to the
partition that have occurred since Q (uses
version structures to verify that it has
received all mods).
• Client undoes those modifications and
verifies against Q.
61
Verifying a Query
– After delay
Client: Ok, server, five minutes ago, you answered a
query Q. I want to check on you. So first I will verify
that the partition containing that query is consistent
with the current checksum.
Server: So you’ll want the checksum for
that partition, all the rows for that partition,
just as before.
Client: Right. Once I verify that current rows in partition
are correct, I will want you to send me all modifications
back to the time when you executed Q. I will compute the
partition as it was then and verify Q.
62
Verifying a Dump
Client: Ok, server, I want to do a dump of the database.
So I will verify that every partition is consistent with the
current checksum.
Server: So you’ll want a lot of history.
Client: That’s right. But it’s only once a day.
63
What About Inserts/Deletes?
• These will have to be batched to hide the
associations among different fields of
rows.
• How to do this? For each table R having
cleartext attributes A, B, C, we already
have Ra, Rb, Rc to avoid Sweeny attack.
• In addition, we will have a table Rbatch
that will contain entirely encrypted blobs.
64
Using Rbatch
• Queries on say R.A = 14 will access (and
decrypt) Rbatch as well as Ra.
• When Rbatch becomes large, some client
with time on his hands (“de-batcher-athome”) will spread the encrypted blobs
from Rbatch to Ra, Rb, Rc, scrambling the
order of course.
65
Benefits/Costs of Query Approach
• Can use database engine as more than a
block server but have to support an
operation log and version structures.
• Can check on the server at any time.
• Checking and recalculation of checksums
can be done off-line and altruistically
(“checking-at-home”)
66
StrangerDB achieves
• Use virtues of servers:
– Queries, reliability, availability, historical
backup
• But avoid their vices: they might cheat.
• Encryption protects privacy.
• Signatures makes tampering evident.
• Histories/version structures + global
exchange make forking evident.
• Result: serializability, no forks, privacy.
67
What About Joins?
• In a typical join query, the join is between
a foreign key and a key:
select * from sale, product
where sale.productid = product.productid
and sale.customer = ‘Bob’
and product.category = ‘fishing’
• We would use the customer index for sale
and the category index for product and do
the join at the client site.
68
Alternative approach to joins
• Pre-join based on foreign keys but still
have one attribute in the clear:
saleprod(sale.prod, encrypt(sale atts,
customer atts, nonce))
• Treat this as a normal table.
• This “foreign key denormalization” has
same number of rows as sale, but wider.
69
Concurrency Control
• Accesing v’s data can be done by locking
hash (vdata).
• To increase concurrency, partition v’s rows
into k parts, each with its own hash. The
user u would then write k hashdata values
in the version structure.
• Transactions will lock the appropriate hash
values.
70
Making Version Structures Fast
• Verifying, signing, and sending in a version
structure may take time.
• The verification protocol involves many version
structures so is expensive part.
• To make this faster, server sends signed version
structures as they appear to users that have
subscribed to this list so most computation can
be done asynchronously.
• A user sends in its signed version structure
when user is ready and all is verified.
71
Controlling Version Structure Size
• As it stands, as the number of users
increases, the size of storage increases as
the square.
• Fortunately, it is seldom the case that so
many people trust one another.
• It is fine to have several subpopulations
each with shared version structure lists.
• Subpopulations could overlap but at some
cost to commits.
72
Implementation of Cryptographic
Assumptions
• Public key infrastructure to hold public
signature keys.
• Smartcards are used to authenticate user
and perhaps as a final private processing
node.
• If attached to a cell phone, smartcards can
be used for the server-independent
communication in the global version
exchange protocol.
73
View Maintenance
• Deferred view maintenance (or
equivalently triggers) is usually done by
the server on behalf of a transaction. In
this case it must be done by a user that
has the right to modify the data.
• If user u changes private data but sends a
summary of those updates to a user v who
maintains that view, then u must use a
public key of v to do so.
74
Read Write Asymmetry
Using Public Key Cryptography
• Current use case is each user or group
wants to implement its own data.
• Sometimes, as in view maintenance, one
user wants to share some data with
another (e.g. between patient and
hospital).
• Use ssh-style protocol with public keys.
Data is owned by patient but read by
hospital.
75
Indexes
• Suppose a user wants not all of his/her
data but only the people in some subset
(e.g. in his/her department).
• Indexes are simply merkle tree like
structures. So start at the root, get the next
page verifying the contents based on the
hash. Decrypt that page and proceed.
• Modifications return back up the tree.
76
Questions
• Questions and criticism welcome.
• If I’ve missed a significant reference,
please let me know now and by email.
• Thank you!
77
How to Access Data
• The data available to a user u is the data to
which that user has a decryption key.
• For each user v whose data u has access to, u
fetches that data using the index beginning with
pointer associated with hash(vdata).
• Decrypt the part of the index needed (i.e.
decrypt the root, then necessary child etc.)
• This gives a set of row ids.
• Fetch the rows with those rowids from database.
78
Modifying data
•
•
•
•
•
•
User u modifies certain rows that u owns.
Reflects modification in the indexes.
Encrypts.
Updates hash(udata) as appropriate.
Does signing verification protocol.
Commits the transaction.
79