Limiting Disclosure in Hippocratic Databases

Download Report

Transcript Limiting Disclosure in Hippocratic Databases

Limiting Disclosure in
Hippocratic Databases
VLDB
August 31, 2004
Kristen LeFevre
Rakesh Agrawal
Vuk Ercegovac
Raghu Ramakrishnan
Yirong Xu
David DeWitt
Presentation Outline






8/31/2004
Hippocratic Databases framework for managing
privacy, including the problem of limiting
disclosure
Overview of our proposal for integrating policydriven disclosure control into an existing relational
database environment
Brief discussion of alternative cell-level
enforcement models
Optimized implementation of opt-in and opt-out
choices
Overview of performance evaluation
Conclusions
Limiting Disclosure in Hippocratic
Databases
2
Hippocratic Databases and
Limited Disclosure



Hippocratic Databases have been proposed as a
framework for managing privacy-sensitive information
Limited disclosure is one of the defining principles of this
framework
Limited Disclosure includes 3 Main Ideas:
 Privacy Policy Organizations define a set of rules
describing to whom data may be disclosed (recipients)
and how the data may be used (purposes)
 Consent Data subjects given control over who may
see their personal information and under what
circumstances
 Disclosure Control Database ensures that privacy
policy and data subject consent is enforced with
respect to all data access

8/31/2004
Limits the outflow of information from the database
Limiting Disclosure in Hippocratic
Databases
3
Motivating Example


Consider a group of athletes registering for a
major international competition
Personal information is collected from each
athlete, possibly including


Data must be managed according to the
organizing committee’s privacy policy



8/31/2004
Name, Age, Nationality, Address, Phone number, Visa
status
Government officials are allowed to see visa information
for the purpose of venue security
Team travel agents may see the contact information for
athletes from their own country for making travel
arrangements
Organizing committee may not disclose athletes’
information to journalists without the athlete’s consent
Limiting Disclosure in Hippocratic
Databases
4
Limited Disclosure Framework Goals




8/31/2004
Provide techniques for enforcing a
broad class of privacy policy rules
Privacy policy enforcement should
require little or no modification to
existing application code
Policy rules should be stored and
managed by the database
Provide limited disclosure
enforcement at the cell level
Limiting Disclosure in Hippocratic
Databases
5
Limited Disclosure Framework
Overview
Start with an existing
database environment
with associated
applications
Privacy policy is
defined and stored
in the database in
privacy meta-data
tables
Query
When providing
information,
data subjects
also provide
consent for
various data use
Query Modifier
Subject
Consent
Policy
Definition
Privacy MetaData
8/31/2004
Queries are
modified so results
respect privacy
policy and consent
Data
Table
Limiting Disclosure in Hippocratic
Databases
Consent
Info
6
Policy Definition

Privacy policy is defined using one
of the following XML-based policy
definition languages


8/31/2004
Platform for Privacy Preferences (P3P)
Enterprise Privacy Authorization
Language (EPAL)
Limiting Disclosure in Hippocratic
Databases
7
Privacy Meta-Data and
Policy Meta-Language

Privacy “meta-language” for expressing
the privacy policy in the database





8/31/2004
Not tied to one particular policy language
Many practical P3P and EPAL policies can be
translated to this language
Privacy policy is a set of rules of the form
<data, purpose, recipient, condition>
Condition must be a predicate that can be
expressed in SQL
Privacy policy rules stored in the database
Limiting Disclosure in Hippocratic
Databases
8
Privacy Meta-Data Example
Journalists
Government
may
officials
only see
mayathletes’
see athletes’
namesvisa
for information
the purposefor
of
writing
security
articles
purposes.
with explicit consent
8/31/2004
Policy
Rule
Purpose
Recipient
Table
Column
CondID
P1
R1
Security
Gov’t Off.
Athletes
Visa
-
P1
R2
Security
Gov’t Off.
Athletes
Name
-
P1
R3
Travel
Travel Ag.
Athletes
Name
-
P1
R4
Travel
Travel Ag.
Athletes
Phone
-
P1
R5
Articles
Journalist
Athletes
Name
C1
P1
R6
Articles
Journalist
Athletes
Address
C2
CondID
Predicate
C1
“EXISTS (SELECT Name_choice FROM Athlete_choices
WHERE Athletes.Athlete# = Athlete_choices.Athlete# AND
Athlete_choices.Name_choice = 1)”
C2
“EXISTS (SELECT Name_choice FROM Athlete_choices
WHERE Athletes.Athlete# = Athlete_choices.Athlete# AND
Athlete_choices.Address_choice = 1)”
Limiting Disclosure in Hippocratic
Databases
9
Query Modification


8/31/2004
Implemented two alternative
algorithms for modifying queries to
incorporate policy rules and consent
information
Queries modified in such a way that
query results follow one our celllevel semantic models
Limiting Disclosure in Hippocratic
Databases
10
Enforcement Models

8/31/2004
Row (tuple)-level enforcement
insufficient for enforcing arbitrary
policies when existing database
schemas are not designed with the
policy in mind
Limiting Disclosure in Hippocratic
Databases
11
An Example
Table “Athletes”
Athlete#
Name
Age
Address
Phone
1
Michael Phelps
19
Baltimore
111-1111
2
Natalie Coughlin
22
Berkeley
222-2222
3
Ian Thorpe
23
Sydney
333-3333
4
Jenny Thompson
31
New York
444-4444
#
Athlete#
Name
Age
Address
Phone
1
√
√
√
√
√
2
X
X
X
X
X
3
√
X
X
√
√
4
√
√
X
X
X
Consent information for journalists writing stories
8/31/2004
Limiting Disclosure in Hippocratic
Databases
12
Row-Level Enforcement
Table “Athletes”
Athlete#
Name
Age
Address
Phone
1
Michael Phelps
19
Baltimore
111-1111
2
Natalie Coughlin
22
Berkeley
222-2222
3
Ian Thorpe
23
Sydney
333-3333
4
Jenny Thompson
31
New York
444-4444
#
Athlete#
Name
Age
Address
Phone
1
√
√
√
√
√
2
X
X
X
X
X
3
√
X
X
√
√
4
√
√
X
X
X
Consent information for journalists writing stories
8/31/2004
Limiting Disclosure in Hippocratic
Databases
13
Must either disclose
prohibited
information,
or restrict
information
that should be
available!
Filter Athlete #2
because no
consent is
provided
Row-Level Enforcement
Athlete#
Name
Age
Address
Phone
1
Michael Phelps
19
Baltimore
111-1111
3
Ian Thorpe
23
Sydney
333-3333
4
Jenny Thompson
31
New York
444-4444
#
Athlete#
Name
Age
Address
Phone
1
√
√
√
√
√
2
X
X
X
X
X
3
√
X
X
√
√
4
√
√
X
X
X
Consent information for journalists writing stories
8/31/2004
Limiting Disclosure in Hippocratic
Databases
14
Enforcement Models

Cell-level enforcement


8/31/2004
Table Semantics model
Query Semantics model
Limiting Disclosure in Hippocratic
Databases
15
Table Semantics Enforcement
1.
2.
3.
8/31/2004
“Mask” prohibited cells with the
null value
Filter rows where the primary key
is prohibited
Conceptually, query is performed
on top of this “view”
Limiting Disclosure in Hippocratic
Databases
16
Table Semantics Enforcement


8/31/2004
SQL’s null value represents “no value”
Desirable semantics for prohibited values
 Predicates applied to null never evaluate
to true
 Null does not join with other values
 Null is not included when computing
aggregates
Limiting Disclosure in Hippocratic
Databases
17
Table Semantics Enforcement
Table “Athletes”
Consent Information
Athlete#
Name
Age
Address
Phone
#
Athlete#
Name
Age
Address
Phone
1
Michael Phelps
19
Baltimore
111-1111
1
√
√
√
√
√
2
Natalie Coughlin
22
Berkeley
222-2222
2
X
X
X
X
X
3
Ian Thorpe
23
Sydney
333-3333
3
√
X
X
√
√
4
Jenny Thompson
31
New York
444-4444
4
√
√
X
X
X
Mask prohibited
cells with null
Filter rows where
the primary key is
prohibited
Athlete#
Name
Age
Address
Phone
1
Michael Phelps
19
Baltimore
111-1111
Sydney
333-3333
3
4
Jenny Thompson
Athlete#
Name
Age
Address
Phone
1
Michael Phelps
19
Baltimore
111-1111
Sydney
333-3333
3
4
8/31/2004
Jenny Thompson
Limiting Disclosure in Hippocratic
Databases
18
Enforcement Models

Cell-level enforcement


8/31/2004
Table Semantics model
Query Semantics model
Limiting Disclosure in Hippocratic
Databases
19
Query Semantics Enforcement
1.
2.
3.
8/31/2004
“Mask” prohibited cells with the
null value
Execute the query on top of the
masked table
Filter rows that are entirely null
from the result set
Limiting Disclosure in Hippocratic
Databases
20
Query Semantics Enforcement
Mask prohibited
cells with null
Athlete#
Name
Age
Address
Phone
1
Michael Phelps
19
Baltimore
111-1111
Sydney
333-3333
3
4
Issue Query:
SELECT Name, Age
FROM Athletes
Jenny Thompson
Name
Age
Michael Phelps
19
Jenny Thompson
Filter rows that are
entirely null from
result set
Name
Age
Name
Age
Michael Phelps
19
Michael Phelps
19
Jenny Thompson
Query Semantics
Jenny Thompson
Table Semantics
8/31/2004
Limiting Disclosure in Hippocratic
Databases
21
Query Modification Example
(Table Semantics)
SELECT Name
FROM Athletes
WHERE Name = ‘Michael Phelps’
SELECT
CASE WHEN EXISTS
(SELECT Name_Choice
FROM Athlete_Choices
WHERE Athletes.Athlete# = Athlete_Choices.Athlete#
AND Athlete_Choices.Name_Choice = 1)
THEN Name ELSE null END
FROM Athletes
WHERE Name = ‘Michael Phelps’
AND EXISTS
(SELECT Athlete#_Choice
FROM Athlete_Choices
WHERE Athletes.Athlete# = Athlete_Choices.Athlete#
AND Athlete_Choices.Athlete#_Choice = 1)
8/31/2004
Limiting Disclosure in Hippocratic
Databases
22
Database-level disclosure control


Database the best place to enforce limited
disclosure
More efficient, flexible, and secure than an
application-level approach


Need not fetch prohibited data from the database
When applied naively, an application-level approach
leads to privacy leaks when applied at the cell level

8/31/2004
Consider the query SELECT Name, Age
FROM Athletes WHERE Age > 30
Limiting Disclosure in Hippocratic
Databases
23
Example: Difficulties of application-level
disclosure control
Table “Athletes”
Query the database;
Retrieve results to
application
Athlete#
Name
Age
Address
Phone
1
Michael Phelps
19
Baltimore
111-1111
2
Natalie Coughlin
22
Berkeley
222-2222
3
Ian Thorpe
23
Sydney
333-3333
4
Jenny Thompson
31
New York
444-4444
Name
Age
Jenny Thompson
31
Consent Information
Check policy and
consent info;
replace prohibited
cells with null
Name
Age
Jenny Thompson
#
Athlete#
Name
Age
Address
Phone
1
√
√
√
√
√
2
X
X
X
X
X
3
√
X
X
√
√
4
√
√
X
X
X
Based on this query, it is easy to infer that Jenny Thompson’s age
is greater than 30!
8/31/2004
Limiting Disclosure in Hippocratic
Databases
24
Database-level disclosure control


Database is a logical place to enforce limited
disclosure
More efficient and flexible than an applicationlevel rule engine approach


Need not fetch prohibited data from the database
When applied naively, an application-level approach
leads to privacy leaks when applied at the cell level



8/31/2004
Consider the query SELECT Name, Age FROM
Athletes WHERE Age > 30
Alternative approach performs much query
processing in the application
Even more complicated to compute aggregates and
joins when some cells are prohibited!
Limiting Disclosure in Hippocratic
Databases
25
Optimized Implementation of Opt-in
and Opt-out Conditions


Important to note that SQL queries offer
much flexibility for defining disclosure
conditions
In practice simple opt-in and opt-out
choices are often used to express subject
consent and are extremely important



8/31/2004
Sufficient for expressing P3P policy rules
Sufficient for expressing many HIPAAmandated policies, for example.
Implemented several techniques for
storing consent and optimizing this type
of condition
Limiting Disclosure in Hippocratic
Databases
26
Optimized Implementation of Opt-in
and Opt-out Conditions

Several alternative storage
techniques



8/31/2004
Internal column (inline) representation
External, single table representation
External, multiple table representation
Limiting Disclosure in Hippocratic
Databases
27
Optimized Implementation of Opt-in
and Opt-out Conditions
Internal Column representation
Table “Athletes”
Athlete#
Name
Age
Address
Phone
Athlete #
Name
Age
Address
Phone
1
Michael
Phelps
19
Baltimore
1111111
yes
yes
yes
yes
yes
2
Natalie
Coughlin
23
Berkeley
2222222
no
no
no
no
no
3
Ian
Thorpe
23
Sydney
3333333
yes
no
no
yes
yes
4
Jenny
Thompson
31
New York
4444444
yes
yes
no
no
no
8/31/2004
Limiting Disclosure in Hippocratic
Databases
28
Optimized Implementation of Opt-in
and Opt-out Conditions
External, single table representation
Table “Athletes”
Consent Table
Athlete#
Name
Age
Address
Phone
ID
Athlete#
Name
Age
Address
Phone
1
Michael
Phelps
19
Baltimore
1111111
1
yes
yes
yes
yes
yes
2
Natalie
Coughlin
23
Berkeley
2222222
2
no
no
no
no
no
3
yes
no
no
yes
yes
3
Ian
Thorpe
23
Sydney
3333333
4
yes
yes
no
no
no
4
Jenny
Thompson
31
New York
4444444
8/31/2004
Limiting Disclosure in Hippocratic
Databases
29
Optimized Implementation of Opt-in
and Opt-out Conditions
External, multiple table representation
Positive Consent Tables
Table “Athletes”
Athlete#
Name
Age
Address
Phone
Athlete#
Name
Age
1
Michael
Phelps
19
Baltimore
1111111
1
1
1
3
4
2
Natalie
Coughlin
23
Berkeley
2222222
4
3
Ian
Thorpe
23
Sydney
3333333
Address
Phone
4
Jenny
Thompson
31
New York
4444444
1
1
3
3
8/31/2004
Limiting Disclosure in Hippocratic
Databases
30
Overview of Performance Experiments



Implemented Query Modification algorithms on top of
DB2 version 8.1
Focused on measuring performance for unconditional
rules, and those with opt-in and opt-out choices
Experimental setup






8/31/2004
Synthetic dataset based on the Wisconsin Benchmark
Dual-processor 1.8 GHz AMD Machine running Windows
2000 Server
2 gigabytes memory
50 megabyte buffer pool
Queries run warm and cold
Here we report the warm numbers (error less than
±5% with 95% confidence)
Limiting Disclosure in Hippocratic
Databases
31
Elapsed Time (seconds)
40
30
20
10
Unmodified
Modified External Multiple
Modified Internal
0
0
•
•
•
20
40
60
Choice Selectivity (%)
80
100
Measured performance of a query selecting all records from a 5 millionrecord table
Compared performance of original and modified queries for varied choice
selectivity
Not surprisingly, performance actually better for modified queries when we
use privacy enforcement as an additional selection condition
– Able to use indexes on choice values
•
Shows the importance of database-level privacy enforcement for
performance
8/31/2004
Limiting Disclosure in Hippocratic
Databases
32
80
Elapsed Time (seconds)
Unmodified
60
Modified Internal
Modified External Multiple
40
20
0
1
5
10
Data Table Size (millions of records)
•
•
Measured overhead cost using a query that selects all records
Choice selectivity = 100%
– Observed worst-case scenario where no rows are filtered due to privacy
constraints, but incur all costs of cell-level checking
•
•
•
Full bar represents elapsed time
Bottom portion of bar is CPU time
Much of the cost of privacy enforcement is CPU cost, so scales well as
queries become more I/O intensive
8/31/2004
Limiting Disclosure in Hippocratic
Databases
33
Additional Performance Results

Cost of rewriting queries is small



Must only be done once if query is pre-compiled
Found that query semantics enforcement model is
often faster than table semantics because
frequently more rows are filtered
Tradeoffs between choice storage techniques

Number of choices stored for a particular table
 As more choices are stored, performance of
internal representation suffers

Number of choices enforced for a particular query


Tradeoffs between query modification algorithms

8/31/2004
As more choices are enforced, performance of
external multiple representation suffers
Described in paper
Limiting Disclosure in Hippocratic
Databases
34
Conclusions


Limited Disclosure is a necessary component of a
comprehensive data privacy management system
Proposed a framework enforcing limited disclosure at
the database level




8/31/2004
More efficient and flexible than application-level disclosure
control
Techniques also have broader use for other applications
requiring policy-driven fine-grained disclosure control
Framework can be deployed to an existing environment
with minimal modification to legacy applications and
existing schemas
Query modification and consent storage approaches
efficient enough to be viable in practice
Limiting Disclosure in Hippocratic
Databases
35
Questions