Anonymity and Privacy in Electronic Services
Download
Report
Transcript Anonymity and Privacy in Electronic Services
Privacy and anonymity
Claudia Diaz
Katholieke Universiteit Leuven
Dept. Electrical Engineering – ESAT/COSIC
UPC Seminar, Barcelona, November 24, 2008
Claudia Diaz (K.U.Leuven)
1
Introducing myself...
2000: I.T.S. de Telecomunicaciones, Universidad de Vigo
Master thesis (PfC) partially done at KULeuven as Erasmus student
2000-2001: Pre-doctoral student at COSIC/KULeuven
2001-2005: PhD student at COSIC/KULeuven
2005-now: Post-doc researcher at COSIC/KULeuven
My work:
Anonymity metrics
Anonymous communications: analysis and attacks
Traffic analysis
Stego file systems
Other PETs: private search, negative databases, ...
http://www.esat.kuleuven.be/~cdiaz
Claudia Diaz (K.U.Leuven)
2
Introducing COSIC (1/2)
Group at the Dept. of Electrical Engineering of the
Katholieke Universiteit Leuven
COSIC: COmputer Security and Industrial Cryptography
Professors: Bart Preneel, Ingrid Verbauwhede and Vincent
Rijmen
5 post-docs (more arriving soon)
40 PhD students (more arriving soon)
5-15 visitors and external associated researchers at any given
time
Very international: 15-25 nationalities, ~60% non-Belgians
Claudia Diaz (K.U.Leuven)
3
Introducing COSIC (2/2)
Research areas
Involved in 25-30 active projects
Symmetric Key Crypto
Public Key Crypto
Secure Hardware Implementations
Identity Management
Privacy Technologies
Mobile Secure Networks
Software Security
~ 10 European Projects (FP6 and FP7)
Coordinator of FP6 NoE E-CRYPT, FP7 NoE E-CRYPT II, FP7 IP TAS3
Biggest success: Rijndael encryption algorithm (AES)
Claudia Diaz (K.U.Leuven)
4
… a few words on the scope of the talk
New field, in development
Give an idea on the problems, concepts and
solutions for anonymity
Claudia Diaz (K.U.Leuven)
5
Outline
Privacy and anonymity
Anonymous communications
Anonymity metrics
Social networks
Claudia Diaz (K.U.Leuven)
6
Solove on “I have nothing to hide”
"the problem with the ‘nothing to hide’ argument is its
underlying assumption that privacy is about hiding bad
things.“
"Society involves a great deal of friction, and we are
constantly clashing with each other. Part of what makes
a society a good place in which to live is the extent to
which it allows people freedom from the
intrusiveness of others. A society without privacy
protection would be suffocation, and it might not be a
place in which most would want to live."
Claudia Diaz (K.U.Leuven)
7
Diffie and Landau on Internet
eavesdropping
Governments are expanding their surveillance powers to protect
against crime and terrorism.
BUT:
Secrecy, lack of transparency, and diminished safeguards may easily
lead to abuses
“Will the government’s monitoring tools be any more secure than the
network they are trying to protect? If not, we run the risk that the
surveillance facilities will be subverted or actually used against the
U.S.A.”
They conclude: “Communication is fundamental to our species;
private communication is fundamental to both our national
security and our democracy. Our challenge is to maintain this
privacy in the midst of new communications technologies and
serious national security threats. But it is critical to make choices
that preserve privacy, communications security and the ability to
innovate. Otherwise, all hope of having a free society will vanish.”
Claudia Diaz (K.U.Leuven)
8
Claudia Diaz (K.U.Leuven)
9
Privacy properties from a technical
point of view
Anonymity
Unlinkability
Hiding user activity
Pseudonymity
Hiding link between two or more actions / identities
Unobservability
Hiding link between identity and action
One-time pseudonyms / persistent pseudonyms
Plausible deniability
Not possible to prove user knows / has done
something
Claudia Diaz (K.U.Leuven)
10
The concept of Privacy [Solove]
Privacy threats we are trying to protect against
(out of 16 identified by Solove)
Surveillance: monitoring of electronic transactions
Interrogation: forcing people to disclose information
Preventive property: plausible deniability
Aggregation: combining several sources of information
Preventive properties: anonymity, unobservability
Preventive property: unlinkability
Identification: connecting data to individuals.
Preventive properties: anonymity and unlinkability
Claudia Diaz (K.U.Leuven)
11
Anonymity – Data and Communication
Layers
App
App
Com
Com
IP
Alice
Claudia Diaz (K.U.Leuven)
Bob
12
Classical Security Model
• Confidentiality
• Integrity
• Authentication
• Non repudiation
• Availability
Alice
Bob
Eve
Passive / Active
Claudia Diaz (K.U.Leuven)
13
Anonymity – Concept and Model
Set of Alices
Claudia Diaz (K.U.Leuven)
Set of Bobs
14
Anonymity Adversary
• Passive/Active
Recipient?
• Partial/Global
Third Parties?
• Internal/External
Claudia Diaz (K.U.Leuven)
15
Anonymity Adversary
The adversary will:
Try to find who is sending messages to whom.
Observe
All links (Global Passive Adversary)
Some links
Modify, delay, delete or inject messages.
Control some nodes in the network.
The adversary's limitations
Cannot break cryptographic primitives.
Cannot see inside nodes he does not control.
Claudia Diaz (K.U.Leuven)
16
Soft privacy enhancing technologies
Hard privacy
Focus on data minimization
Adversarial data holder / service provider
Soft privacy
Policies, access control, liability, right to correct
information
Adversary: 3rd parties, corrupt insider in honest
SP, errors
BUT user has already lost control of her data
Claudia Diaz (K.U.Leuven)
(Slide taken from G. Danezis)
17
Other privacy-enhancing technologies
Anonymous credentials / e-cash / ZK
protocols
Steganography / covert communication
Censorship resistance techniques
Anonymous publication
Private information retrieval (PIR)
Private search
K-anonymity
Location privacy
Claudia Diaz (K.U.Leuven)
18
Outline
Privacy and anonymity
Anonymous communications
Anonymity metrics
Social networks
Claudia Diaz (K.U.Leuven)
19
Concept of Mix: collect inputs
Router that hides
correspondence between
inputs and outputs
Claudia Diaz (K.U.Leuven)
20
Concept of Mix: mix and flush
Router that hides
correspondence between
inputs and outputs
Claudia Diaz (K.U.Leuven)
21
Functionality of Mixes
Mixes modify
The appearance of messages
Encryption / Decryption
Sender → Mix1 : {Mix2, {Rec, msg}KMix2}KMix1
Padding / Compression
Substitution of information (e.g., IP)
The flow of messages
Reordering
Delaying - Real-time requirements!
Dummy traffic - Cost of traffic!
Claudia Diaz (K.U.Leuven)
22
Pool Mixes
Based on the mix proposed by Chaum in 1981:
1.
Collect N inputs
2.
Shuffle
Round
3.
Flush (Forward)
Pool selection algorithm
No pool / Static pool / Dynamic pool
Influences the performance and anonymity provided
by the mix
Flushing condition
Time / Threshold
Deterministic / Random
Claudia Diaz (K.U.Leuven)
23
Example of pool mix
Deterministic threshold
static pool Mix
Pool = 2
Threshold = 4
Claudia Diaz (K.U.Leuven)
24
Stop-and-Go Mix
Proposed by Kesdogan in 1998
Reordering strategy based on delaying
M/M/∞
Delays generated by the user from an
Exponential distribution
Timestamping to prevent active attacks
Trusted Time Service
Anonymity estimates based on the
assumption of Poisson incoming traffic
Claudia Diaz (K.U.Leuven)
25
Network topology
Mixes are combined in networks in order to
Distribute trust
Improve availability
Cascade
Fully connected
network
Claudia Diaz (K.U.Leuven)
Restricted route
network
26
Cascades vs Free Route topologies
Flexibility of routing
Surface of attack
Availability
Advantage free routes
Intersection attacks
Advantage free routes
Advantage cascades (anonymity set smaller but no
partitioning possible)
Trust
Advantage free routes (more choices available to user)
Claudia Diaz (K.U.Leuven)
27
Peer-to-peer vs client-server
architectures
Symmetric vs asymmetric systems
Surface of attack
Liability issues
Advantage client-server
Availability
Advantage client-server
Resources / incentives / quality of service
Advantage peer-to-peer
Advantage peer-to-peer
Sybil attacks
Advantage? Depending on admission controls (for
peers/servers)
Claudia Diaz (K.U.Leuven)
28
Deployed systems I
Anon.penet.fi (Helsingius 1993)
Simple proxy, substituted email headers
Kept table of correspondences nym-email
Brought down by legal attack in 1996
Type I Cypherpunk remailers (Hughes,
Finney1996)
No tables (routing info in msgs themselves),
PGP encryption (no attacks based on content) –
attacks based on size are possible
Chains of mixes (distribution of trust)
Reusable reply blocks (source of insecurity)
Claudia Diaz (K.U.Leuven)
29
Deployed systems II
Mixmaster (Cottrell, evolving since 1995)
Fixed size (padding / dividing large messages)
Integrity protection measures
Multiple paths for better reliability
No replies
Mixminion (Danezis, 2003)
SURBs (Single-Use Reply Blocks)
Packet format: detection of tagging attacks (all-ornothing)
Forward security: trail of keys, updated with one-way
functions
Claudia Diaz (K.U.Leuven)
30
Low-latency applications
Stream-based instead of message-based communication
Bi-directional circuits
Delaying not an option
Proposed systems
Ephemeral session keys, onion-encrypted (forward secrecy)
Real-time requirements
Web browsing, interactive applications, voice over IP, etc.
C-S: Onion Routing, ISDN, Web Mixes
P2P: Crowds, P5 (broadcast), Herbivore (DC-nets)
Implemented systems:
TOR, JAP
Claudia Diaz (K.U.Leuven)
31
Onion encryption
R2
R3
D
R1
R
D231
Claudia Diaz (K.U.Leuven)
32
Onion Routing
R2
R3
D
R1
R1
Claudia Diaz (K.U.Leuven)
R2
R3
D
33
TOR – adversary model
Claudia Diaz (K.U.Leuven)
34
Anonymizing http traffic not trivial
Difficult to conceal traffic pattern
Difficult to pad
Vulnerable to strong adversaries (entry+exit)
Fingerprinting attacks
Lots of padding: scalability / cost problem
Little padding: not enough to conceal pattern
Adversary observes only user side
Internet exchanges
Claudia Diaz (K.U.Leuven)
35
Dummy Traffic
Fake messages/traffic introduced to confuse the
attacker
Undistinguishable from real traffic
Dummies improve the anonymity by making more
difficult the traffic analysis
Neccessary for unobservability
Expensive
Claudia Diaz (K.U.Leuven)
36
(Some) Attacks on mix-based systems
Passive attacks
Long-term intersection attacks (statistical disclosure)
Persistent communication patterns
Extract social network information
Traffic correlation / confirmation
Firgerprinting
Source separation
Epistemic attacks (route selection)
Active attacks
N-1 attacks
Sybil
Tagging
Replay
DoS
Claudia Diaz (K.U.Leuven)
37
Long-Term Intersection Attacks
Family of attacks with many variants:
Assumption:
Combine many observations (looking at who receives when Alice sends)
Intuition:
Alice has persistent communication relationships (she communicates
repeatedly with her friends)
Method:
Disclosure attack (Agrawal, Kesdogan)
Hitting set attack (Kesdogan)
Statistical disclosure attack (Danezis, Serjantov)
Extensions to SDA (Dingledine and Mathewson)
Two-Sided SDA (Danezis, Diaz, Troncoso)
Receiver-bound cover traffic (Mallesh, Wright)
If we observe rounds in which Alice sends, her likely recipients will appear
frequently
Result:
We can create a vector that expresses Alice’s sending probabilities (a
sending profile)
Claudia Diaz (K.U.Leuven)
38
Outline
Privacy and anonymity
Anonymous communications
Anonymity metrics
Social networks
Claudia Diaz (K.U.Leuven)
39
Definition [PfiHan2000]
First clear definition of anonymity (2000)
Anonymity is the state of being not
identifiable within a set of subjects, the
anonymity set.
The anonymity set is the set of all possible
subjects who might cause an action or be
addressed.
Anonymity depends on:
The number of subjects in the anonymity set
The probability distribution of each subject in the
anonymity set being the target
Claudia Diaz (K.U.Leuven)
40
Example: computing anonymity metrics
for a pool mix
p1=2-1
Recipient R1
Potentially infinite
subjects in the
recipient anonymity set
p2=2-2
Probability of
recipient Ri : pi=2-i
p3=2-3
Recipient R2
Recipient R3
p4=2-4
Recipient R4
Claudia Diaz (K.U.Leuven)
41
Entropy: information-theoretic anonymity
metrics [DSCP02, SD02]
Entropy: measure of the amount of information required
on average to describe the random variable
Measure of the uncertainty of a random variable
Increases with N and with uniformity of distribution
N
H pi log2 pi
i 1
Distribution with entropy H equivalent to uniform
distribution with 2H subjects
Other information theoretic metrics: min-entropy, maxentropy, Rényi entropy, relative entropy, mutual
information, ....
Claudia Diaz (K.U.Leuven)
42
Combining traffic analysis with social
network knowledge
“On the Impact of Social Network Profiling on
Anonymity” [DTS-PETS08]
Use of Bayesian inference to combine different
sources of information (SN knowledge, text mining)
Results:
“Combined” anonymity decreases with network growth (if
only one source of information is considered, then growing
the network does not decrease anonymity)
Anonymity degradation as more profiles become known
ERRORS: Bad quality SN profile information
Small profile errors lead to large errors in the results
If adversary not completely confident of SN info → cannot have
any confidence on results
Claudia Diaz (K.U.Leuven)
43
Combinatorial approaches
Edman et al.
Consider deanonymization for a system as a whole
(instead of individual users)
Find perfect matching inputs/outputs
Perfect anonymity for t messages: t! equiprobable
combinations
GTDPV-WPES08
Edman et al.’s metric overestimates anonymity if users
send/receive more than one message
Generalization to users sending/receiving more than one
message
Divide and conquer algorithm to compute the metric
Upper and lower bounds on anonymity (easy to compute)
Claudia Diaz (K.U.Leuven)
44
Outline
Privacy and anonymity
Anonymous communications
Anonymity metrics
Social networks
Claudia Diaz (K.U.Leuven)
45
A social network approach
Most anonymity techniques focus on channels
and one-to-one relationships
Mailing lists, twikis, blogs, online social
networks:
Many-to-many communication (communities)
Shared spaces and collective content (group photo
albums, forums)
Moreover, information about third parties (not in
the community) is also leaked (pictures, stories,
references in posts)
This scenario presents new privacy challenges
Claudia Diaz (K.U.Leuven)
46
“The Economics of Mass Surveillance”
[DanWit06]
Model:
Assumptions
Clubs
People
Claudia Diaz (K.U.Leuven)
If one of the club participants
is under surveillance all
information shared in this
club, and the membership list
of the club becomes known to
Eve
Eve has a limited budget
(number of people it can put
under surveillance)
47
Target selection
How best to choose those to be put under
surveillance to maximize returns in terms of
observed clubs, people and their relationships?
How does the lack of information, due to the use of
an anonymizing networks, affect the effectiveness of
such target selection?
Data: mail archives of political network 2003-2006
Clubs = mailing lists (373)
People = email addresses posting more than 5 times to
those mailing lists (2338)
Links = number of posts of person over 3 years (3879)
Claudia Diaz (K.U.Leuven)
48
With and without anonymizing network
Without
Eve can observe all traffic flow (not
the content) and thus construct the
social graph (links people-clubs)
Eve chooses to put under
surveillance the nodes with the
highest degrees, excluding all
spaces already under surveillance
Result: surveilling 8% of the people
is enough to control 100% of the
clubs
Claudia Diaz (K.U.Leuven)
With
Eve can only observe aggregated
amount of traffic generated by
people
Eve puts under surveillance the
nodes that generate the most volume
Results: 50% links known when
surveying 5% of people. To get 80%
of links an adversary needs to
observe 30% of people (vs. 3%)
49
The Economics of Covert Community
Detection and Hiding [Nag08]
Adversary tries to uncover communities and learn
community membership
Counter-surveillance techniques that can be
deployed by covert groups (e.g., topological
rewiring)
“Our study confirms results from previous work
showing that invasion of privacy of most people is
cheap and easy”
With certain counter-surveillance strategies: “70% of
the covert community going undetected even when
99% of the network is under direct surveillance.”
Claudia Diaz (K.U.Leuven)
50
Conclusions
Privacy / anonymity is an area of research getting
increasing attention
Technical, legal, sociological, psychological, economic, political,
philosophical aspects
Economics of privacy:
Crypto: little overhead → lots of security
Anonymity: lots of overhead → a little bit of security
Privacy invasion is:
High-latency applications (email):
Problems with persistent user behavior
Low-latency applications
Profitable for businesses
Increases the power of Government
Insecure towards strong adversaries
Anonymous communications are fragile: if you want to
propose a new system:
Check the literature
Check known attacks
Claudia Diaz (K.U.Leuven)
51
(Some) Open Questions
Do individuals care enough to pay the price of privacy? Will they in the
future? What is privacy anyway?
Will privacy technology be implemented to re-establish the tradeoffs and
power balances of the pre-information era? Or will society and individuals
redefine privacy and power tradeoffs and adapt their behaviour to
accomodate a reality of privacy invasion and surveillance?
Anonymity metrics:
Do we really understand what is anonymity?
Which is the best way to measure anonymity? Which are the “adequate
levels”?
Current metrics very limited in application (e.g., how to measure privacy
losses due to personal data becoming public?
Real user behavioral models
Models for adversary knowledge
Solutions for Low-Latency Anonymous Communication?
Solutions for Social Networks?
Claudia Diaz (K.U.Leuven)
52
Thank you!
Recommended bibliography on the subject:
http://www.freehaven.net/anonbib/
Claudia Diaz (K.U.Leuven)
53