20070327_tuesdaysem
Download
Report
Transcript 20070327_tuesdaysem
802.11 User
Fingerprinting
Jeff Pang, Ben Greenstein,
Ramki Gummadi, Srini Seshan, and
David Wetherall
Most slides borrowed from Ben
Location Privacy is at Risk
Your MAC address:
00:0E:35:CE:1F:59
Usually < 100m
You
“The adversary”
(a.k.a., some dude with a laptop)
Are pseudonyms enough?
MAC address now:
00:0E:35:CE:1F:59
MAC address later:
00:AA:BB:CC:DD:EE
Implicit Identifiers Remain
Consider one user at SIGCOMM 2004
Visible in an “anonymized” trace
MAC addresses scrubbed
Effectively a pseudonym
Transferred 512MB via bittorrent
=> Crappy performance for everyone else
Let’s call him Bob
Can we figure out who Bob is?
Implicit Identifier: SSIDs
SSIDs in Probe Requests
Windows XP, Mac OS X probe for your preferred
networks by default
Set of networks advertised in a traffic sample
Determined by a user’s preferred networks list
SSID Probe:
“roofnet”
Bob
What if Bob used pseudonyms?
“roofnet” probe occurred during
different session than bittorrent
download
Can no longer explicitly associate
“roofnet” with poor network etiquette
Can we do it implicitly?
Implicit Identifier:
Network Destinations
Network Destinations
Set of IP <address, port> pairs in a traffic sample
In SIGCOMM, each visited by 1.15 users on average
A user is likely to visit a site repeatedly
(e.g., an email server)
SSH/IMAP server:
159.16.40.45
Bob
What if network is encrypted?
Can’t see IP addresses through linklayer encryption like WPA
Is Bob safe now?
Implicit Identifier:
Broadcast Packet Sizes
Broadcast Packet Sizes
Set of 802.11 broadcast packet sizes in a traffic sample
E.g., Windows machines NetBIOS naming advertisements;
FileMaker and Microsoft Office advertise themselves
In SIGCOMM, only 16% more unique <application, size>
tuples than unique sizes
Broadcast packet sizes:
239, 245, 257
Bob
Implicit Identifier:
MAC Protocol Fields
MAC Protocol Fields
Header bits (e.g., power mgmt., order)
Supported rates
Offered authentication algorithms
Mac Protocol Fields:
11,4,2,1Mbps, WEP, etc.
Bob
What else do implicit identifiers tell us?
David J. Wetherall
Anonymized 802.11 Traces from SIGCOMM 2004
Search on Wigle for “djw” in the Seattle area
A pseudonym
Google pinpoints David’s home (to within 200 ft)
Automating Implicit Identifiers
?
?
?
TRAINING:
OBSERVATION:
Collect some traffic
known to be from Bob
Which traffic is
from Bob?
Methodology
Simulate using
SIGCOMM, USCD
“The adversary”
Split trace into
training data and
observation data
Sample = 1hour of
traffic to/from a user
Assume pseudonyms
Did this traffic sample come
from Bob?
Naïve Bayesian Classifier:
We say sample s (with features fi) is from Bob if
Pr[s from Bob | s has features fi] > T
How to convert implicit identifiers into features?
Did This Traffic Sample Come
from Bob?
Features:
Set similarity (Jaccard Index), weighted by frequency:
Rare
linksys
Common
PROFILE FROM
TRAINING
djw
IR_Guest
SIGCOMM_1
SAMPLE FOR
VALIDATION
Individual Feature Accuracy
60% TPR with 99% FPR
Higher FPR, likely due to not
being user specific
Useful in combination with
other features, to rule out
identities
Multi-feature Accuracy
Samples from 1 in 4 users are identified
>50% of the time with 0.001 FPR
bcast + ssids +
fields + netdests
bcast + ssids +
fields
bcast + ssids
Was Bob here today?
Maybe…
Suppose N users present
Over an 8 hour day, 8*N opportunities to
misclassify a user’s traffic
Instead, say Bob is present iff
multiple samples are classified as his
Was Bob here today?
In a busy coffee shop
with 25 concurrent
users, more than half
(54%) can be
identified with 90%
accuracy
4 hour median to
detect (4 samples)
27% with two 9s.
Conclusion:
Pseudonyms Are Insufficient
4 new identifiers: netdests, ssids, fields, bcast
Average user emits highly distinguishing identifiers
Adversary can combine features
Future
Uncover more identifiers (timing, etc.)
Validate on longer/more diverse traces
(SSIDs stable in home setting for >=2 weeks)
Build a better link layer