Talk Slides - The Stanford University InfoLab

Download Report

Transcript Talk Slides - The Stanford University InfoLab

Two Can Keep a Secret:
A Distributed Architecture for Secure
Database Services
Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan,
Hector Garcia-Molina, Krishnaram Kenthapadi,
Rajeev Motwani, Utkarsh Srivastava, Dilys Thomas,
Ying Xu
Stanford University
1
Motivation

Data outsourcing growing in popularity
– Cheap, reliable data storage and management

Privacy concerns looming ever larger
– High-profile thefts (often insiders)
– Govt. legislation, e.g., California SB 1386

The Cure: Secure Database Service [KC04]
– Outsource to Database Service Provider (DSP)
– …but DSP cannot “see” data
2
The Crypto Approach
[HILM02, HIM04,AKSX04]
Client
Query Q
Answer
Encrypt
Client-side
Processor
DSP
Q’
“Relevant Data”
Problem: Q’  “SELECT *”
3
The Power of Two
Client
DSP1
DSP2
4
The Power of Two
Query Q
Q1
DSP1
Q2
DSP2
Client-side
Processor
Key: Ensure Cost (Q1)+Cost (Q2)  Cost (Q)
5
Agenda
Defining Privacy Requirements
 Tools for Database Decomposition
 Example


Finding a “good” decomposition
Query reformulation and execution

Open Questions

6
Privacy according to SB 1386
”…first name or first initial and last name in combination
with any one or more of the following data elements,
when either the name or the data elements are not
encrypted:
(1) Social Security Number.
(2) Driver’s license number or California Identification
Card number.
(3) Account number, credit or debit card number, in
combination with any required security code…”
7
In the language of set theory…
{ Name, SSN},
{ Name, LicenceNo}
{ Name, CaliforniaID}
{ Name, AccountNumber}
{ Name, CreditCardNo, SecurityCode}
are all to be kept private.
 A set is private if at least one of its elements is
“hidden”.

– Element in encrypted form ok
8
Defining A Private Decomposition
R1
R
R2

Given a set of privacy constraints
– Each constraint is a set of attributes

Adversary knows either R1 or R2
– Insider: views all data, queries at site

Ensure for each constraint:
– At least one attribute is “opaque” to adversary
– i.e., neither R1 nor R2 exposes all attributes
– We won’t define “opaque” 
9
Relation Decomposition

Charter
“Bury
them
nature will
– Break
upand
“universal”
relation R into R1 and R2
care of
the rest”
–take
Lossless,
privacy-preserving
decomposition
– Note: Restriction to “relational” algebra

Tools of the Trade
–
–
–
–
Fragmentation
Encoding
Semantic Attribute Decomposition
Noise
10
Tools of the Trade: Fragmentation

Horizontal Fragmentation
– R = R1 U R2
– Not too exciting  (yet?)

Vertical Fragmentation
– Partition attributes across R1 and R2
– E.g., to obey constraint {Name, SSN},
R1  Name, R2  SSN
– Use tuple IDs for reassembly. R = R1 JOIN R2
11
Tools of the Trade: Encoding

Encode attribute across both R1 and R2
– Need both parts to reconstruct
– Why? Sensitive attributes, e.g., Email
– Different options with privacy vs. query cost
(computation, communication) trade-offs

E.g., One-time Pad
–
–
–
–
For each value v, construct random bit seq. r
R1  v XOR r, R2  r
Reconstruction: (v XOR r) XOR r = v
Perfect privacy, Expensive?
12
Tools of the Trade: Encoding (2)

Deterministic Encryption
– R1  EK (v) R2  K
– Leaks information. E.g., can detect equality
– Can push selections with equality predicate

Random addition
– R1  v+r , R2  r
– Can push aggregate SUM
– Problem: Information leak (what is
“opaque”?)
13
More Tools of the Trade

Semantic Attribute Decomposition
– Extract “public” data from private attrs.
– E.g., Area code of PhoneNo, Domain name of
Email
– Useful for filtering selections, aggregates

Adding Noise
– Add “dangling tuples” to R1 and R2
14
Example
An Employee relation: {Name, DoB, Position,
Salary, Gender, Email, Telephone, ZipCode}
 Privacy Constraints

–
–
–
–

{Telephone}, {Email}
{Name, Salary}, {Name, Position}, {Name, DoB}
{DoB, Gender, ZipCode}
{Position, Salary}, {Salary, DoB}
Will use just Vertical Fragmentation and
Encoding.
15
Example (2)
Constraints
{Telephone}
{Email}
{Name, Salary}
{Name, Position}
{Name, DoB}
{DoB, Gender,ZipCode}
{Position, Salary}
{Salary, DoB}
R1
Salary
ID
Name
DoB
Position
Salary
Gender
Email
Telephone
ZipCode
ID
R2
16
Finding a “Good” Decomposition

Find a decomposition that
– Obeys all privacy constraints
– Minimizing execution cost for given workload

Complicated optimization problem
– After 3 layers of simplification, NP-hard to
approximate!

Multiple heuristics based on min-cuts and set
cover
– E.g., (1) Find lots of “efficient” decompositions
(2) Modify to obey constraints and pick the best
17
Query Reformulation and Execution

Overview
– Take original query Q on R
– Rewrite to get Q1(R1) and Q2(R2)
– Combine results

Key idea
–
–
–
–

Take query plan for Q
Replace R by R1 JOIN R2
Push down selects, projects, aggregates
Partition plan into two
Space of plans
– Different ways to do joins – symmetric, semi-joins
– Note: Q2 can depend on result of Q1
18
Open Questions

Does the idea work? Compare efficiency to an
encryption-based scheme
– Need evaluation methodology

Generalize decomposition
– Deal with multiple relations, functional dependencies,
normal forms
– Allow attribute replication

Expand space of decompositions
– We only considered simple encoding and vert.
fragmentation
19
Open Questions (2)

Details of the 2-database architecture
– How much functionality ends up on client
side?
– How to handle other DB functions? Access
Control? Constraint checking?

How about two logical DBs with disjoint
administration?
20