802.11 User Fingerprinting

Download Report

Transcript 802.11 User Fingerprinting

Two reasons you
can’t trust a
wireless network
Ben Greenstein, Jeff Pang, Ramki Gummadi,
Srini Seshan, and David Wetherall
(and some stuff that goes on at
Intel Research Seattle)
Jeff Hightower, Ali Rahimi, Ian Smith,
Josh Smith, Matthai Philipose,
Tanzeem Choudhury, Sunny Consolvo,
Beverly Harrison, Anthony LaMarca
Intel Research Seattle (IRS)

Lab focus is on computing
systems that work in everyday
environments (ubiquitous
computing)

Magic formula:





Evaluate the need for a
technology
Build neat hardware (EE)
Apply magic statistics (ML)
Provide user feedback (HCI)
+ Ben + Wetherall (NET/SYS) =
new focus on mobile
networking and systems
Magic Statistics:
Sensor-based activity recognition


Problem: it’s very hard to understand what
people are doing with vision
Idea: instrument objects/people and use
machine learning on sensor data to
recognize their activities

For eldercare, fitness, social dynamics, …
Neat Hardware:
Mobile Sensing Platform

Platform Features






Built on iMote2
Linux OS
32M RAM
MSP body-worn version
2GB flash storage
ZigBee & Bluetooth radios
10 on-board sensors



MSP sensor brick version
3D Accelerometer, 2D Compass, Barometer, Humidity, Visible
light, Infrared light, Temperature
UART, GPIO breakouts for additional sensors
~16 hour battery life
Neat Hardware:
iBracelet – an RFID tag reader

Bracelet detects nearby tag

Observation: What you use determines what you do
activities
tm = 5 min
0.
4
kettle
0.2
0.3
stove faucet
tm = 2 min
0.6
cup
tm = 30s
0.2
0.4
0.4
teabag
milk
sugar
User Feedback:
Ubiquitous Fitness (UbiFit)


Challenge: Use ubiquitous computing to
encourage people to sustain an increased
level of physical activity
Phone to provide personal awareness
Approach:
whenever & wherever the user is
+
+
MSP for automatic activity detection & mobile
phone for manual entry of activities
=
Other Projects:
WiSPs: Smart RFID tags


Problem: Rich sensor networks hit lifetime/size constraints
due to batteries
Idea: Extend long-range RFID to provide sensor
capabilities



Tags are passive, harvest power to compute/communicate
Tags host sensors/actuators (acceleration, light, LEDs, flash)
Custom analog circuit for power, communications
Other Projects:
Software radio/RFID

Build an EPC Gen2 RFID reader w/ USRP



Some hardware tweaks needed
WiSPs let the tag side be changed already
No RFID systems work(?) and many issues to explore


Particularly for a sensor network context
Multi-access, energy, reliability, privacy, …
Other Projects:
Personal Robotics


Problem: robots mostly function in well
structured environments, e.g., factories
Idea: use sensing/ML to enable robots to
function in less structured environments




E-field gripper for manipulation
WiSPs for localization
3D laser range finder
vision
Other Projects:
Ultra-mobile devices


Problem: Desktop interaction don’t
work well for small, mobile devices
Idea: looking at this now …
MSB
iMote
Pedestrian Navigator demo:
Google Earth maps stay
oriented to heading and 3D
image stays level to the ground
An Inertial Board in the 3-board
MSP stack
Other Projects:
Trustworthy Wireless


Problem: wireless lacks privacy and is
vulnerable to interference
Idea: Randomize/encrypt
communications to exclude attacker
Client: Tag, MSP,
phone, laptop …
XXXXXXXXXX
eavesdrop
interfere
Third party
(nearby)
Authorized server:
AP, reader …
Trustworthy wireless topics

Reason 1: Privacy threat


Jeff Pang (CMU)
Reason 2: Vulnerability to interference
Location Privacy is Now at Risk
Your MAC address:
00:0E:35:CE:1F:59
Usually < 100m
“The New You”
“The Adversary”
“The Old You”
The problem
scales
The Privacy Threat Posed by
Wireless Communication is Real
David J. Wetherall
Anonymized 802.11 Traces from SIGCOMM 2004
Search on Wigle for “djw” in the Seattle area
A pseudonym
Google pinpoints David’s home (to within 200 ft)
Problem: Researchers propose using
pseudonyms, but is this enough?
PeepResearch.org
“You”
“The Adversary”
Another Real Example: Some
“Anonymous” Guy BitTorrents 0.5G of
Data at a Conference in 2004
SSID:
Roofnet
Consistent
Card/Driver
Characteristics
IMAP
SSH
Server
Server
PeepResearch.org
Broadcast
Packets with
Sizes 239, 245, 257
“A guy
“You”
from MIT”
“The Adversary”
Identifying Features

Network
Destinations


Set of IP <address,
port> pairs in a traffic
sample
SSIDs in Probe
Requests


Set of networks
advertised in a traffic
sample
Broadcast Packet
Sizes


Set of 802.11 broadcast
packet sizes seen in a
traffic sample
MAC Protocol Fields



Header bits (e.g., power
mgmt., order)
Supported rates
Offered authentication
algorithms
Methodology
“The adversary”
Ethereal
Label: Bertha
Label: John
Features: SSIDs, etc.
Features: SSIDs, etc.
Label: Mary
Features: SSIDs, etc.
TRAINING
VALIDATION
Methodology

Simulate using
SIGCOMM, USCD
and Home traces

“The adversary”


Ethereal
Split trace into
training data and
validation data
Sample = 1 hour of
traffic to/from a user
Ignore MACs for the
latter

presume pseudonyms
Did This Traffic Sample Come
from User U?
Distance Metric:
Set similarity (Jaccard Index), weighted by frequency:
Rare
linksys
Common
PROFILE FROM
TRAINING
djw
IR_Guest
SIGCOMM_1
SAMPLE FOR
VALIDATION
Did This Traffic Sample Come
from User U?
Naïve Bayesian Classifier:
We say sample s (with features fi) is from user U if LHS > T
=
We vary T for different True Positive / False Positive Rates
sensitivity
Receiver Operating
Characteristics (ROCs)
1 - specificity


UCSD: 60% TPR with 1% FPR
Perfect classifier would have
90% TPR for ~0% FPR


Higher FPR, likely due to not
being user specific
Useful only in combination
with other features, to rule
out identities
Combining features helps

In public networks, samples from 1 in 4 users are
identified >50% of the time with 0.999 accuracy
bcast + ssids +
fields + netdests
bcast + ssids +
fields
bcast + ssids
Was User U Here at Time t?

Maybe…


Over an 8 hour day, 8 opportunities to misclassify
a user’s traffic
Instead, say user U is present if multiple
samples are classified as being his
TPRtarget  Pr [ X  belief ]
FPRtarget  Pr [ Y  belief ]
belief is number of samples you believe are from U
X is binomial r. v. with params n = active and p = tprQ1
Y is binomial r. v. with params n = 8 and p  1-(1-fprQ1)N
One 9 of Accuracy…



In a busy coffee shop
with 25 concurrent
users, more than half
(54%) can be
identified with 90%
accuracy
4 sample median to
detect
27% with two 9s.
Conclusion:
Pseudonyms Are Insufficient

4 new identifiers: netdests, ssids, fields, bcast
Average user emits highly distinguishing identifiers
Adversary can combine features

Future




Uncover more identifiers (timing, etc.)
Build a usable 802.11 network (APs and clients) that
protects privacy





Encrypted names/addresses
Hidden resource discovery/binding
Online verification of privacy
Channel hopping to resist interference
Working out next steps now
Trustworthy wireless topics


Reason 1: Privacy threat
Reason 2: Vulnerability to interference

Ramki Gummadi (USC)
Communication in the ISM band is
vulnerable to interference


Increasingly crowded
Un-(under)-regulated
n2
Interference threats
Malicious
Selfish


Characterize how
802.11 operates with
interference in practice
Improve design to
better tolerate
interference

Unacceptable for a low
power or a narrowband interferer to
bring throughput to
zero
Reception Rate
Problem
SNR (logscale)
802.11 Implementation
Vulnerabilities
PHY
MAC
To RF Amplifiers
RF
Signal ADC
6-bit
samples
AGC
Timing
Recovery
Barker
Correlator
Demodulator
Descrambler
Preamble Detector/
Header CRC-16 Checker
Data
(includes
beacons)
Receiver
= Vulnerabilities


Jam with 1s → SYNC on sender clock lost
Emit burst at frame start → Gain set incorrectly



Even with weak interferer, b/c attenuated disproportionately
Send premature start frame delimiter → packet misinterpreted
Damage consecutive beacons → clients disassociate
Experimental Setup
Wired
E
Endpoint
UDP/TCP traffic
between client/wired
endpoint through AP
Access
AP Point
I Single active attacker

802.11

C


Wireless
Client

Can vary power and
frequency
Can output arbitrary
waveforms
Unattenuated PRISM
Attenuated PRISM
802.15.4
Characterizing 802.11
Interference

How far (i.e., by changing power) can the
attacker be and still be effective?


E.g., dynamic range selection interference
How much does frequency separation
help?
Dynamic range selection
On-off random patterns (5ms/1ms)
10000
900
800
1000
700
600
Throughput under
PRISM
100
500
400
10
Latency under PRISM
300
200
1
100
Latency under
Zigbee
0.1
−∞
-20
-12
-2
0
0
8
Interferer Power (dBm)
12
15
20
Latency (microseconds)
Throughput under Zigbee
Throughput (kbps)

AGC:
V > t, -30dB
Result:
ADC over/
underflow
Impact of frequency
separation
Throughput (kbps)
10000
15MHz Separation
1000
10MHz Separation
5MHz Separation
Same Channel
100
10
1
0.1
−∞
-20
-12
0
8
12
Interferer Power (dBm)
15
20
Rapid channel hopping

With existing hw!



Dwell period is 10ms, switching latency is 250µs
AP exchanges encrypted MD5 seed with clients
AP and clients independently hop to the same
channel
Evaluation of channel
hopping
10000
CH, UDP Traffic
Throughput (kbps)
1000
CH, TCP Traffic
100
No CH, UDP Traffic
10
1
0.1
0
5
10
PRISM Interferer Power (dBm)
15
20
Conclusions

Selfish and malicious interferers cause
substantial degradation in commodity NICs


Even weak and narrow-band interferers are
effective
Changing 802.11 parameters does not
mitigate interference, but rapid channel
hopping can