Talk Slides - The Stanford University InfoLab
Download
Report
Transcript Talk Slides - The Stanford University InfoLab
Two Can Keep a Secret:
A Distributed Architecture for Secure
Database Services
Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan,
Hector Garcia-Molina, Krishnaram Kenthapadi,
Rajeev Motwani, Utkarsh Srivastava, Dilys Thomas,
Ying Xu
Stanford University
1
Motivation
Data outsourcing growing in popularity
– Cheap, reliable data storage and management
Privacy concerns looming ever larger
– High-profile thefts (often insiders)
– Govt. legislation, e.g., California SB 1386
The Cure: Secure Database Service [KC04]
– Outsource to Database Service Provider (DSP)
– …but DSP cannot “see” data
2
The Crypto Approach
[HILM02, HIM04,AKSX04]
Client
Query Q
Answer
Encrypt
Client-side
Processor
DSP
Q’
“Relevant Data”
Problem: Q’ “SELECT *”
3
The Power of Two
Client
DSP1
DSP2
4
The Power of Two
Query Q
Q1
DSP1
Q2
DSP2
Client-side
Processor
Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)
5
Agenda
Defining Privacy Requirements
Tools for Database Decomposition
Example
Finding a “good” decomposition
Query reformulation and execution
Open Questions
6
Privacy according to SB 1386
”…first name or first initial and last name in combination
with any one or more of the following data elements,
when either the name or the data elements are not
encrypted:
(1) Social Security Number.
(2) Driver’s license number or California Identification
Card number.
(3) Account number, credit or debit card number, in
combination with any required security code…”
7
In the language of set theory…
{ Name, SSN},
{ Name, LicenceNo}
{ Name, CaliforniaID}
{ Name, AccountNumber}
{ Name, CreditCardNo, SecurityCode}
are all to be kept private.
A set is private if at least one of its elements is
“hidden”.
– Element in encrypted form ok
8
Defining A Private Decomposition
R1
R
R2
Given a set of privacy constraints
– Each constraint is a set of attributes
Adversary knows either R1 or R2
– Insider: views all data, queries at site
Ensure for each constraint:
– At least one attribute is “opaque” to adversary
– i.e., neither R1 nor R2 exposes all attributes
– We won’t define “opaque”
9
Relation Decomposition
Charter
“Bury
them
nature will
– Break
upand
“universal”
relation R into R1 and R2
care of
the rest”
–take
Lossless,
privacy-preserving
decomposition
– Note: Restriction to “relational” algebra
Tools of the Trade
–
–
–
–
Fragmentation
Encoding
Semantic Attribute Decomposition
Noise
10
Tools of the Trade: Fragmentation
Horizontal Fragmentation
– R = R1 U R2
– Not too exciting (yet?)
Vertical Fragmentation
– Partition attributes across R1 and R2
– E.g., to obey constraint {Name, SSN},
R1 Name, R2 SSN
– Use tuple IDs for reassembly. R = R1 JOIN R2
11
Tools of the Trade: Encoding
Encode attribute across both R1 and R2
– Need both parts to reconstruct
– Why? Sensitive attributes, e.g., Email
– Different options with privacy vs. query cost
(computation, communication) trade-offs
E.g., One-time Pad
–
–
–
–
For each value v, construct random bit seq. r
R1 v XOR r, R2 r
Reconstruction: (v XOR r) XOR r = v
Perfect privacy, Expensive?
12
Tools of the Trade: Encoding (2)
Deterministic Encryption
– R1 EK (v) R2 K
– Leaks information. E.g., can detect equality
– Can push selections with equality predicate
Random addition
– R1 v+r , R2 r
– Can push aggregate SUM
– Problem: Information leak (what is
“opaque”?)
13
More Tools of the Trade
Semantic Attribute Decomposition
– Extract “public” data from private attrs.
– E.g., Area code of PhoneNo, Domain name of
Email
– Useful for filtering selections, aggregates
Adding Noise
– Add “dangling tuples” to R1 and R2
14
Example
An Employee relation: {Name, DoB, Position,
Salary, Gender, Email, Telephone, ZipCode}
Privacy Constraints
–
–
–
–
{Telephone}, {Email}
{Name, Salary}, {Name, Position}, {Name, DoB}
{DoB, Gender, ZipCode}
{Position, Salary}, {Salary, DoB}
Will use just Vertical Fragmentation and
Encoding.
15
Example (2)
Constraints
{Telephone}
{Email}
{Name, Salary}
{Name, Position}
{Name, DoB}
{DoB, Gender,ZipCode}
{Position, Salary}
{Salary, DoB}
R1
Salary
ID
Name
DoB
Position
Salary
Gender
Email
Telephone
ZipCode
ID
R2
16
Finding a “Good” Decomposition
Find a decomposition that
– Obeys all privacy constraints
– Minimizing execution cost for given workload
Complicated optimization problem
– After 3 layers of simplification, NP-hard to
approximate!
Multiple heuristics based on min-cuts and set
cover
– E.g., (1) Find lots of “efficient” decompositions
(2) Modify to obey constraints and pick the best
17
Query Reformulation and Execution
Overview
– Take original query Q on R
– Rewrite to get Q1(R1) and Q2(R2)
– Combine results
Key idea
–
–
–
–
Take query plan for Q
Replace R by R1 JOIN R2
Push down selects, projects, aggregates
Partition plan into two
Space of plans
– Different ways to do joins – symmetric, semi-joins
– Note: Q2 can depend on result of Q1
18
Open Questions
Does the idea work? Compare efficiency to an
encryption-based scheme
– Need evaluation methodology
Generalize decomposition
– Deal with multiple relations, functional dependencies,
normal forms
– Allow attribute replication
Expand space of decompositions
– We only considered simple encoding and vert.
fragmentation
19
Open Questions (2)
Details of the 2-database architecture
– How much functionality ends up on client
side?
– How to handle other DB functions? Access
Control? Constraint checking?
How about two logical DBs with disjoint
administration?
20