Jigsaw: Solving the Puzzle of Enterprise 802.11 Analysis
Download
Report
Transcript Jigsaw: Solving the Puzzle of Enterprise 802.11 Analysis
Jigsaw: Solving the Puzzle of
Enterprise 802.11 Analysis
Yu-Chung Cheng
John Bellardo, Mikhail Afanasyev, Patrick Verkaik,
Jennifer Chiang, Peter Benko
Alex C. Snoeren, Geoff Voelker, Stefan Savage
Department of Computer Science & Engineering
University of California, San Diego
18.07.2015
Yu-Chung Cheng/Qualcomm CR&D
1
The promise of Enterprise 802.11?
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
2
A familiar story...
“The wireless is being flaky.”
“Flaky how?”
“Well, my connections got dropped
earlier and now things seem very
sloooow.”
“OK, we will take a look”
Employee
“Wait, wait … it’s ok now”
“Mmm… well let us know if you
have any more problems.”
Now what?
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
Support
3
What are the problems?
Contention with nearby wireless devices?
Bad AP channel assignments?
Microwave ovens?
Congestions in the Internet?
Bad interaction between TCP and 802.11?
Rogue access points?
Poor choice of APs (weak signal)?
Incompatible user software/hardware?
802.11 DoS attack?!
…
Network admins are not paid enough
to figure this out…
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
4
Why is this hard to understand?
RF domain defies traditional networking intuition
Wireless topology not well-modeled as a graph
Asymmetry is common for all characteristics
Packet loss, bandwidth, interference, etc.
Variability in all characteristics caused by:
Distance/mobility, orientation, temperature, RF workload, etc
Automatic management: MAC, rate control, access point
selection
Huge inter-vendor variation
Scale – lots of different RF domains
Mobility management is complex
7/18/2015
The undeclared layer 2.5…
L2 (assoc, scan, etc), ARP, DHCP, registration, etc
Yu-Chung Cheng/Qualcomm CR&D
5
Goal: What’s going on in my
network?
Real-time diagnosis of wireless network problems
In a production 802.11 network
Identify components of delay at physical, link, network
and transport layers
Deconstruct full end-to-end behavior
Interactions between environment, 802.11 PHY/MAC,
TCP/UDP
Ultimately: understand the most
important sources of performance
problems and opportunities for
improvement
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
6
New CSE building at UCSD
150k square feet
4 floors + basement
>500 occupants
150 faculty/staff
350 students
Building-wide WiFi
40 access points
802.11b/g
Channel 1, 6, 11
10 - 100 active clients
anytime
Daily traffic ~10 GB
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
7
UCSD passive monitor system
Overlays existing WiFi
Series of passive
sniffers
Blanket deployment
for best coverage
48 sensor pods (192
radios)
7/18/2015
4 radios per pod (cover
all channels in use)
Captures/timestamps all
802.11 activity
(including physical
errors)
Stream back to
centralized server
(>6TB storage)
Yu-Chung Cheng/Qualcomm CR&D
8
Jigsaw system
Constructs single view of all 802.11 activity
Unifies frame views from all radios
Transitive synchronization across all views
(max dispersion ~10us; 80% within 5us)
Reconstructs discrete L2, L3 and L4 state
Inference of unseen events and host state
(vantage point limitations) via protocol behavior
Designed to make it easy to add analysis modules
Physical fingerprints, contention inference, DHCP
analysis, etc
Easy to measure cross-layer interactions
Yu-Chung Cheng, John Bellardo, Peter Benko, Alex C. Snoeren, Geoffrey M.
Voelker, and Stefan Savage, Jigsaw: Solving the Puzzle of Enterprise 802.11
Analysis, SIGCOMM 2006
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
9
Traces synchronization and
unification
Sniffers label packets w/ local timestamp (TSF)
Need a global clock
Estimate the offset between TSF and the global clock for
each sniffer
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
10
Part of a Jigsaw trace (L1/L2)
Monitors
Received
Received,
frames
CRC
error
Client 1
Traces
synchronized
HW
corrupted
Time
Client 2
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
11
Jigsaw in Action
Physical layer inference
Link layer modeling
Transport layer flow reconstruction
End-to-end cross-layer diagnosis
Media access problems
Mobility management overhead
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
12
Hidden terminal interference
Co-channel interference
from other transmitters
For sender s and receiver
r, estimate conditional
probability of loss given
simultaneous transmission
by interferer i
i
?
r
Current finding: hidden
terminals not such a big
deal (some exceptions)
s
Hidden-terminal: s sends
Normal: s sends data, r
data, r ‘s reception is
sends ACK
interfered by i
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
13
Broadband interference
~9 am
12-2 pm
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
14
Interference fingerprints
Microwave oven: magnetron driven
by half-wave voltage doubler @ 60Hz
Automatically detect and tag “microwavelike” physical interference
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
15
Link layer Contention: a challenge
to measure
Three kinds of network events
Directly observable: packet sent (easy)
Directly inferable: packet received (harder)
Indirectly inferable: packet delayed by
contention (surprisingly tricky)
Key issues
Need to know input and output at each AP
Need to model internal state of AP
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
16
Model
Infer time at which packet
is queued on AP
(via wireline analysis)
Ethernet serialization
delay
AP bus overhead (2 I/O)
AP processing overhead
Determine if previous
packet had cleared AP
(via wireless analysis)
Head-of-line blocking
(delay attributable to
queuing)
No head-of-line blocking
(delay attributable to
contention/MAC)
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
Directly observed
Inferred/Modeled
17
Access delay (Dacc) at an AP
Contention beyond
Contention during DIFS
backoff
convolved with pkt backoff
Mandatory backoff for last pkt
0-15 slot times (20us ea)
Distributed Inter-Frame Space
7/18/2015
(50us)
Yu-Chung Cheng/Qualcomm CR&D
18
End-to-end cross-layer diagnoses
Media access problems
Mobility overhead
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
19
Pathologies
802.11b faster than 802.11g
Significant unsuccessful effort over 12 months by
IT groups (and vendor) in understanding problem
Issue
Avaya AP only attempts one retry for 802.11g
frames in “protection mode”
High-rate transmissions more sensitive to noise
Export many more losses to IP -> TCP backoff
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
20
Pathologies (2)
Big L2 retry delay (> 10ms) Why?
Broadcast frames have > 50ms avg delay
Why?
Same reason
If any client request power-save mode then AP
must buffer broadcast frames until beacon is sent
Pending frame exchange is postponed until
broadcast burst is completed
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
21
Pathologies (3)
802.11g protection mode
Used when 802.11b clients are present
802.11g client sends a pilot CTS-to-Self
frame (slow) before data
Overhead is about 100% air time
Issue:
We still have many 11b clients
But most 11b traffic are bursty, no need to use
protection all the time
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
22
Pathologies (4)
Lots of “vendor” hacks
Do not respect CSMA
Bursts packets in a row
Early retransmission
Do not wait for the full ACK time
Do not respect protection mode
Do not do exponential back-off (linear)
Announce very large transmission duration
Could mount DOS but not working in reality
Do not increment sequence numbers
…
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
23
TCP diagnoses breakdown
Majors: slow receiver, AP retry bug,
protection mode
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
24
Mobility management overhead
Around 30% of time is spent in mobility management (DHCP,
ARP, association etc)
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
25
Pathologies (5)
Large startup delays (10s of secs)
Client requests DHCP lease for private
address space (192.168/16)
Wireless Management system (Verneir)
grants address with short timeout and won’t
refresh
Client has to do two DHCP transactions with
long timeouts between
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
26
Startup delays breakdown
Delay (seconds)
Majors: (Gratuitous) ARPs + Scans
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
27
Where to next?
Real-time system for automated detection
and evaluation of poor network
performance
Identifies problem flows and isolates
potential causes of poor performance
City-wide network monitoring
Currently deployed in a Bay-area metropolitan
network
Future: explore deployment and protocol
fixes
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
28
Q&A
Live traffic monitoring and more information at
http://sysnet.ucsd.edu/wireless/
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
29
Synchronization
Create a virtual global clock
TSF diff of two sniffers
If A and B are transmitting at
the same time they could
interfere
If A starts transmitting after B
has started then A can’t hear B
Require fine time-scales (1050us)
TSF diff (us)
To keep unification working
Critical evidence for analysis
NTP is >100 usec accuracy
802.11 HW clocks (TSF) have
100PPM stability
Time (s)
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
30
Trace unification (ideal)
Time
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
31
Trace unification (reality)
Jigsaw
unified
trace
JFrame 1
Time
JFrame 2
JFrame 3
JFrame 4
JFrame 5
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
32
Challenge: sync at large-scale
1
2
3
4
To
∆t1
∆t2
How to bootstrap?
Goal: estimate the offset between TSF and the global clock
for each sniffer
Time reference from one sniffer to the other
Sync across channels
Dual radios on same sniffer slaved to same clock
Manage TSF clock skews
Continuously re-adjust offsets when unifying
frames
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
33
Jigsaw syncs 99% frames < 10us
Measure sync. quality
by max dispersion per
Jframe
10 us is important
threshold
802.11 back-off time is
20 us
802.11 inter frame time
is 50 us
Sufficient to infer many
802.11 events
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
34
Sensor pods
Pod = pair of monitors
Separated ~1 meter
>35dB separation at 2.4Ghz
Monitor = Soekris 4826-50
266Mhz 586 class CPU
128MB RAM, 64MB Flash
100Mbps Ethernet
Dual Atheros a/b/g radios
Power-over-Ethernet (semi-std)
Jigdump software
Captures/timestamps all 802.11
activity (including physical
errors)
Stream back to centralized
server (>6TB storage)
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
35