Transcript slides

Inference Problem
Privacy Preserving
Data Mining
Readings and Assignments
Required:
• Pfleeger: Chapter 7
Interesting reading:
• I. Moskowitz, M. H. Kang: Covert Channels – Here to
Stay?
http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.it
d.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994
zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf
•
Jajodia, Meadows: Inference Problems in Multilevel
Secure Database Management Systems
http://www.acsac.org/secshelf/book001/book001.html, essay 24
Lecture 19
CSCE 522 - Farkas
2
Indirect Information Flow Channels
Covert channels
 Inference channels

Lecture 19
CSCE 522 - Farkas
3
Communication Channels


Overt Channel: designed into a system and
documented in the user's manual
Covert Channel: not documented. Covert
channels may be deliberately inserted into a
system, but most such channels are accidents
of the system design.
Lecture 19
CSCE 522 - Farkas
4
Covert Channel
Timing Channel: based on system times
 Storage channels: not time related
communication
 Can be turned into each other

Lecture 19
CSCE 522 - Farkas
5
Inference Channels
Non-sensitive
information
Lecture 19
+
Meta-data
CSCE 522 - Farkas
=
Sensitive
Information
6
Inference Channels
Statistical Database Inferences
 General Purpose Database Inferences

Lecture 19
CSCE 522 - Farkas
7
Statistical Databases

Goal: provide aggregate information about groups of
individuals
 E.g.,

Security risk: specific information about a particular
individual
 E.g.,

average grade point of students
grade point of student John Smith
Meta-data:
 Working
knowledge about the attributes
 Supplementary knowledge (not stored in database)
Lecture 19
CSCE 522 - Farkas
8
Types of Statistics


Macro-statistics: collections of related statistics presented in 2dimensional tables
Sex\Year
1997
1998
Sum
Female
4
1
5
Male
6
13
19
Sum
10
14
24
Micro-statistics: Individual data records used for statistics after
identifying information is removed
Lecture 19
Sex
Course
GPA
Year
F
CSCE 590
3.5
2000
M
CSCE 590
3.0
2000
F
CSCE 790
4.0
2001
CSCE 522 - Farkas
9
Statistical Compromise
Exact compromise: find exact value of an
attribute of an individual (e.g., John Smith’s
GPA is 3.8)
 Partial compromise: find an estimate of an
attribute value corresponding to an individual
(e.g., John Smith’s GPA is between 3.5 and
4.0)

Lecture 19
CSCE 522 - Farkas
10
Methods of Attacks and Protection

Small/Large Query Set Attack
 C:
characteristic formula that identifies groups of
individuals
If C identifies a single individual I, e.g., count(C) = 1
 Find out existence of property


If count(C and D)=1 means I has property D
If count(C and D)=0 means I does not have D
OR
 Find value of property

Lecture 19
Sum(C, D), gives value of D
CSCE 522 - Farkas
11
Small/Large Query Set Attack cont.
Protection from small/large query set attack:
query-set-size control
 A query q(C) is permitted only if
N-n  |C|  n , where n  0 is a parameter of
the database and N is all the records in the
database

Lecture 19
CSCE 522 - Farkas
12
Tracker attack
q(C) is disallowed
C=C1 and C2
T=C1 and ~C2
Tracker
C
C2
C1
q(C)=q(C1) – q(T)
Lecture 19
CSCE 522 - Farkas
13
Tracker attack
q(C and D) is disallowed
C=C1 and C2
T=C1 and ~C2
C
Tracker
C2
C1
C and D
q(C and D)=
q(T or C and D) – q(T)
Lecture 19
D
CSCE 522 - Farkas
14
Query overlap attack
Q(John)=q(C1)-q(C2)
C1
C2
Kathy
John
Paul
Eve
Max
Fred
Lecture 19
Mitch
CSCE 522 - Farkas
Protection:
query-overlap control
15
Insertion/Deletion Attack

Observing changes overtime
 q1=q(C)
 insert(i)
 q2=q(C)
 q(i)=q2-q1

Protection: insertion/deletion performed as
pairs
Lecture 19
CSCE 522 - Farkas
16
Statistical Inference Theory

Give unlimited number of statistics and correct
statistical answers, all statistical databases can
be compromised (Ullman)
Lecture 19
CSCE 522 - Farkas
17
Inferences in General-Purpose
Databases
Queries based on sensitive data
 Inference via database constraints
 Inferences via updates

Lecture 19
CSCE 522 - Farkas
18
Queries based on sensitive data
Sensitive information is used in selection
condition but not returned to the user.
 Example: Salary: secret, Name: public

NameSalary=$25,000

Protection: apply query of database views at
different security levels
Lecture 19
CSCE 522 - Farkas
19
Database Constraints
Integrity constraints
 Database dependencies
 Key integrity

Lecture 19
CSCE 522 - Farkas
20
Integrity Constraints
C=A+B
 A=public, C=public, and B=secret
 B can be calculated from A and C, i.e., secret
information can be calculated from public data

Lecture 19
CSCE 522 - Farkas
21
Database Dependencies
Metadata:
 Functional dependencies
 Multi-valued dependencies
 Join dependencies
 etc.
Lecture 19
CSCE 522 - Farkas
22
Functional Dependency


FD: A  B, that is for any two tuples in the relation,
if they have the same value for A, they must have the
same value for B.
Example: FD: Rank  Salary
Secret information: Name and Salary together
 Query1:
Name and Rank
 Query2: Rank and Salary
 Combine answers for query1 and 2 to reveal Name and
Salary together
Lecture 19
CSCE 522 - Farkas
23
Key integrity
Every tuple in the relation have a unique key
 Users at different levels, see different versions
of the database
 Users might attempt to update data that is not
visible for them

Lecture 19
CSCE 522 - Farkas
24
Example
Secret View
Name (key)
Black P
Red S
Salary
38,000 P
42,000 S
Address
Columbia S
Irmo S
Name (key)
Salary
Address
Black P
38,000 P
Null P
Public View
Lecture 19
CSCE 522 - Farkas
25
Updates
Public User:
Name (key)
Black P
Salary
38,000 P
Address
Null P
1. Update Black’s address to Orlando
2. Add new tuple: (Red, 22,000, Manassas)
If
Refuse update: covert channel
Allow update:
• Overwrite high data – may be incorrect
• Create new tuple – which data it correct
(polyinstantiation) – violate key constraints
Lecture 19
CSCE 522 - Farkas
26
Updates
Secret user:
Name (key)
Salary
Address
Black P
38,000 P
Columbia S
Red S
42,000 S
Irmo S
1. Update Black’s salary to 45,000
If
Refuse update: denial of service
Allow update:
• Overwrite low data – covert channel
• Create new tuple
– which data it correct
(polyinstantiation) – violate key constraints
Lecture 19
CSCE 522 - Farkas
27
Inference Problem
No general technique is available to solve the
problem
 Need assurance of protection
 Hard to incorporate outside knowledge

Lecture 19
CSCE 522 - Farkas
28
The Inference Problem
General Purpose Database:
Non-confidential data + Metadata 
Undesired Inferences
Web Enabled Data:
Non-confidential data + Metadata (data and
application semantics) + Computational Power +
Connectivity  Undesired Inferences
29
Correlated Inference
base
Base
fort
Public
place
address
Place
Public
basin
Water source
district
Water Source
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
Confidential
Base
Water source
30
Inference Control
Public
Access Control
Confidentia
l
X
Misinfo
Organizational
Data
Attacker
X
Ontology
Web Data
Data Integration
and
Inferences
Inference Control
Public
Misinfo
Confidentia
l
Organizational
Data
ACCESS and INFERENCE CONTROL POLICY
• Logic-based inference detection
• Exact and partial disclosure
• Data and metadata protection
• Heterogeneous data manipulation
• Metadata discovery
Data Mining and Privacy

Statistical inference:
 K-anonymity
 Correlation

General inference:
 metadata
 Biased learning
 Pattern
Lecture 19
CSCE 522 - Farkas
33
Next Class

Midterm exam review
Lecture 19
CSCE 522 - Farkas
34