Transcript clock skew
Remote Hardware
Fingerprinting:
A Statistical Approach
R. Fink ~ May, 2006
Problem
Identify Specific Machines via Remote Network
Fingerprinting
Passive
Networked
Physical properties of the machine
Use to:
Identify endpoints in a communication
Show that an endpoint participated in a transaction
Show that an endpoint did not participate in a transaction
Challenges:
What properties?
Similar machines?
Network delay factors?
Timestamps
Source Port
TCP Timestamp Option
T
C
P
Destination Port
Sequence Number
Off
set
Acknowledgement Number
Reser
Flags
Window Size
ved
Checksum
Urgent Pointer
Options + Padding
Kind=8
Len=10
TS Value (32)
TS Reply (32)
32-bit TS Value indicates clock
tick, bound to oscillator circuit, crystal
Present in most TCP packets by default (all of Linux,
Windows can be tricked)
Best part: independent of network time server corrections!
Approach
Passively collect TS values
from observed machine, to
IP address identifies machine
during collection phase
Record to along with
to
measurer system time, tm
Scatter-plot to versus tm
Fit a regression line to the
slope
Slope is the clock skew of
the observed machine: that
is, the amount of drift relative
to the measurer per unit time
Group similar drifts to sort
out individual machines
160
140
120
100
80
60
40
20
0
0
20
40
60
80
tm
100
120
140
160
180
Previous Research
Kohno, Claffy, Broido
63 Campus Machines
38 days of data (12 hour spans)
Convex Hull Method of Fit
Posed, but did not address:
Required sample size
Effect of differing topology
Ignored Statistical Techniques ~
160
140
120
100
80
60
40
20
0
0
20
40
60
80
100
Using a convex hull technique, instead of a linear
regression technique, throws out the whole body of error
analysis theory!
120
140
160
180
Current Work
Recreated Experiment
4 identical Dell GX-150
machines, one observer
Collected initial data on fast
switch
Extended the Research
Skew via linear regression
algorithm
Error analysis theory to estimate
required number of samples
Simulated WAN delay (via Linux
Netfilter hacking) in progress
Measured PCI bus with
frequency counter to verify the
physical link to clock skew
160
140
120
100
80
60
40
20
0
0
20
40
60
80
100
120
140
160
180
Results
1. PCI bus clock speed is directly
related to clock skew
2. Linear regression (in LAN case)
uniquely identifies machines to
within a couple parts per million
(ppm)
3. Number of samples required is
directly proportional to observed
timestamp error and confidence
interval, inversely proportional to
collection interval and allowed ppm
tolerance
• Validated on repeated population
subsets
4. Showed clock skew varies with
machine temperature
5. In progress – experiments on WAN
data
*
1/ 3
4 t n 2 se
n 2 6 10
b1 t
2/3
1
where
b1 sample slope
ppm difference between extremes
se sample y ŷ error
*
t n 2 5% confidence limit
t fixed collection interval, secs
n required number of samples
Summary
Highlights
Clock skew is a repeatable way to fingerprint a specific
machine
Linear regression, a simple machine learning concept, is
readily applied
Statistical error analysis tells us how much to collect
Lowlights
TCP timestamp options are, well, OPTIONAL ~ can just
turn them off
Future Research
Wireless mobile devices: effect of battery, topology,
mobility, clock stepping
Other protocol properties, not just timestamps