Transcript clock skew

Remote Hardware
Fingerprinting:
A Statistical Approach
R. Fink ~ May, 2006
Problem
 Identify Specific Machines via Remote Network
Fingerprinting
Passive
Networked
Physical properties of the machine
 Use to:
Identify endpoints in a communication
Show that an endpoint participated in a transaction
Show that an endpoint did not participate in a transaction
 Challenges:
What properties?
Similar machines?
Network delay factors?
Timestamps
Source Port
 TCP Timestamp Option
T
C
P
Destination Port
Sequence Number
Off
set
Acknowledgement Number
Reser
Flags
Window Size
ved
Checksum
Urgent Pointer
Options + Padding
Kind=8
Len=10
TS Value (32)
TS Reply (32)
32-bit TS Value indicates clock
tick, bound to oscillator circuit, crystal
Present in most TCP packets by default (all of Linux,
Windows can be tricked)
Best part: independent of network time server corrections!
Approach
 Passively collect TS values
from observed machine, to
IP address identifies machine
during collection phase
 Record to along with
to
measurer system time, tm
 Scatter-plot to versus tm
 Fit a regression line to the
slope
Slope is the clock skew of
the observed machine: that
is, the amount of drift relative
to the measurer per unit time
 Group similar drifts to sort
out individual machines
160
140
120
100
80
60
40
20
0
0
20
40
60
80
tm
100
120
140
160
180
Previous Research
 Kohno, Claffy, Broido
63 Campus Machines
38 days of data (12 hour spans)
 Convex Hull Method of Fit
 Posed, but did not address:
Required sample size
Effect of differing topology
 Ignored Statistical Techniques ~
160
140
120
100
80
60
40
20
0
0
20
40
60
80
100
Using a convex hull technique, instead of a linear
regression technique, throws out the whole body of error
analysis theory!
120
140
160
180
Current Work
 Recreated Experiment
4 identical Dell GX-150
machines, one observer
Collected initial data on fast
switch
 Extended the Research
Skew via linear regression
algorithm
Error analysis theory to estimate
required number of samples
Simulated WAN delay (via Linux
Netfilter hacking) in progress
Measured PCI bus with
frequency counter to verify the
physical link to clock skew
160
140
120
100
80
60
40
20
0
0
20
40
60
80
100
120
140
160
180
Results
1. PCI bus clock speed is directly
related to clock skew
2. Linear regression (in LAN case)
uniquely identifies machines to
within a couple parts per million
(ppm)
3. Number of samples required is
directly proportional to observed
timestamp error and confidence
interval, inversely proportional to
collection interval and allowed ppm
tolerance
• Validated on repeated population
subsets
4. Showed clock skew varies with
machine temperature
5. In progress – experiments on WAN
data
*
1/ 3
4  t n  2 se 

n  2  6 10  
 b1 t 
2/3
1
where
b1  sample slope
  ppm difference between extremes
se  sample y  ŷ error
*
t n 2  5% confidence limit
t  fixed collection interval, secs
n  required number of samples
Summary
 Highlights
Clock skew is a repeatable way to fingerprint a specific
machine
Linear regression, a simple machine learning concept, is
readily applied
Statistical error analysis tells us how much to collect
 Lowlights
TCP timestamp options are, well, OPTIONAL ~ can just
turn them off
 Future Research
Wireless mobile devices: effect of battery, topology,
mobility, clock stepping
Other protocol properties, not just timestamps