Jigsaw: Solving the Puzzle of Enterprise 802.11 Analysis

Download Report

Transcript Jigsaw: Solving the Puzzle of Enterprise 802.11 Analysis

Jigsaw: Solving the Puzzle of
Enterprise 802.11 Analysis
Yu-Chung Cheng
John Bellardo, Mikhail Afanasyev, Patrick Verkaik,
Jennifer Chiang, Peter Benko
Alex C. Snoeren, Geoff Voelker, Stefan Savage
Department of Computer Science & Engineering
University of California, San Diego
18.07.2015
Yu-Chung Cheng/Qualcomm CR&D
1
The promise of Enterprise 802.11?
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
2
A familiar story...
“The wireless is being flaky.”
“Flaky how?”
“Well, my connections got dropped
earlier and now things seem very
sloooow.”
“OK, we will take a look”
Employee
“Wait, wait … it’s ok now”
“Mmm… well let us know if you
have any more problems.”
Now what?
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
Support
3
What are the problems?










Contention with nearby wireless devices?
Bad AP channel assignments?
Microwave ovens?
Congestions in the Internet?
Bad interaction between TCP and 802.11?
Rogue access points?
Poor choice of APs (weak signal)?
Incompatible user software/hardware?
802.11 DoS attack?!
…
Network admins are not paid enough
to figure this out…
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
4
Why is this hard to understand?
 RF domain defies traditional networking intuition


Wireless topology not well-modeled as a graph
Asymmetry is common for all characteristics
 Packet loss, bandwidth, interference, etc.

Variability in all characteristics caused by:
 Distance/mobility, orientation, temperature, RF workload, etc
 Automatic management: MAC, rate control, access point
selection


Huge inter-vendor variation
Scale – lots of different RF domains
 Mobility management is complex


7/18/2015
The undeclared layer 2.5…
L2 (assoc, scan, etc), ARP, DHCP, registration, etc
Yu-Chung Cheng/Qualcomm CR&D
5
Goal: What’s going on in my
network?
 Real-time diagnosis of wireless network problems
 In a production 802.11 network
 Identify components of delay at physical, link, network
and transport layers
 Deconstruct full end-to-end behavior
 Interactions between environment, 802.11 PHY/MAC,
TCP/UDP
 Ultimately: understand the most
important sources of performance
problems and opportunities for
improvement
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
6
New CSE building at UCSD
 150k square feet
 4 floors + basement
 >500 occupants
 150 faculty/staff
 350 students
 Building-wide WiFi
 40 access points
 802.11b/g
 Channel 1, 6, 11
 10 - 100 active clients
anytime
 Daily traffic ~10 GB
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
7
UCSD passive monitor system
 Overlays existing WiFi
 Series of passive
sniffers
 Blanket deployment
for best coverage
 48 sensor pods (192
radios)



7/18/2015
4 radios per pod (cover
all channels in use)
Captures/timestamps all
802.11 activity
(including physical
errors)
Stream back to
centralized server
(>6TB storage)
Yu-Chung Cheng/Qualcomm CR&D
8
Jigsaw system
 Constructs single view of all 802.11 activity
 Unifies frame views from all radios
 Transitive synchronization across all views
(max dispersion ~10us; 80% within 5us)
 Reconstructs discrete L2, L3 and L4 state
 Inference of unseen events and host state
(vantage point limitations) via protocol behavior
 Designed to make it easy to add analysis modules
 Physical fingerprints, contention inference, DHCP
analysis, etc
 Easy to measure cross-layer interactions
Yu-Chung Cheng, John Bellardo, Peter Benko, Alex C. Snoeren, Geoffrey M.
Voelker, and Stefan Savage, Jigsaw: Solving the Puzzle of Enterprise 802.11
Analysis, SIGCOMM 2006
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
9
Traces synchronization and
unification



Sniffers label packets w/ local timestamp (TSF)
Need a global clock
Estimate the offset between TSF and the global clock for
each sniffer
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
10
Part of a Jigsaw trace (L1/L2)
Monitors
Received
Received,
frames
CRC
error
Client 1
Traces
synchronized
HW
corrupted
Time
Client 2
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
11
Jigsaw in Action




Physical layer inference
Link layer modeling
Transport layer flow reconstruction
End-to-end cross-layer diagnosis
 Media access problems
 Mobility management overhead
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
12
Hidden terminal interference
 Co-channel interference
from other transmitters
For sender s and receiver
r, estimate conditional
probability of loss given
simultaneous transmission
by interferer i
i
?
r
 Current finding: hidden
terminals not such a big
deal (some exceptions)
s
Hidden-terminal: s sends
Normal: s sends data, r
data, r ‘s reception is
sends ACK
interfered by i
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
13
Broadband interference
~9 am
12-2 pm
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
14
Interference fingerprints
 Microwave oven: magnetron driven
by half-wave voltage doubler @ 60Hz
 Automatically detect and tag “microwavelike” physical interference
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
15
Link layer Contention: a challenge
to measure
 Three kinds of network events
 Directly observable: packet sent (easy)
 Directly inferable: packet received (harder)
 Indirectly inferable: packet delayed by
contention (surprisingly tricky)
 Key issues
 Need to know input and output at each AP
 Need to model internal state of AP
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
16
Model
 Infer time at which packet
is queued on AP
(via wireline analysis)
 Ethernet serialization
delay
 AP bus overhead (2 I/O)
 AP processing overhead
 Determine if previous
packet had cleared AP
(via wireless analysis)
 Head-of-line blocking
(delay attributable to
queuing)
 No head-of-line blocking
(delay attributable to
contention/MAC)
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
Directly observed
Inferred/Modeled
17
Access delay (Dacc) at an AP
Contention beyond
Contention during DIFS
backoff
convolved with pkt backoff
Mandatory backoff for last pkt
0-15 slot times (20us ea)
Distributed Inter-Frame Space
7/18/2015
(50us)
Yu-Chung Cheng/Qualcomm CR&D
18
End-to-end cross-layer diagnoses
 Media access problems
 Mobility overhead
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
19
Pathologies
 802.11b faster than 802.11g
 Significant unsuccessful effort over 12 months by
IT groups (and vendor) in understanding problem
 Issue
 Avaya AP only attempts one retry for 802.11g
frames in “protection mode”
 High-rate transmissions more sensitive to noise
 Export many more losses to IP -> TCP backoff
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
20
Pathologies (2)
 Big L2 retry delay (> 10ms) Why?
 Broadcast frames have > 50ms avg delay
Why?
 Same reason
 If any client request power-save mode then AP
must buffer broadcast frames until beacon is sent
 Pending frame exchange is postponed until
broadcast burst is completed
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
21
Pathologies (3)
 802.11g protection mode
 Used when 802.11b clients are present
 802.11g client sends a pilot CTS-to-Self
frame (slow) before data
 Overhead is about 100% air time
 Issue:
 We still have many 11b clients
 But most 11b traffic are bursty, no need to use
protection all the time
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
22
Pathologies (4)
 Lots of “vendor” hacks
 Do not respect CSMA
 Bursts packets in a row
 Early retransmission
 Do not wait for the full ACK time
 Do not respect protection mode
 Do not do exponential back-off (linear)
 Announce very large transmission duration
 Could mount DOS but not working in reality
 Do not increment sequence numbers
 …
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
23
TCP diagnoses breakdown
Majors: slow receiver, AP retry bug,
protection mode
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
24
Mobility management overhead
Around 30% of time is spent in mobility management (DHCP,
ARP, association etc)
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
25
Pathologies (5)
 Large startup delays (10s of secs)
 Client requests DHCP lease for private
address space (192.168/16)
 Wireless Management system (Verneir)
grants address with short timeout and won’t
refresh
 Client has to do two DHCP transactions with
long timeouts between
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
26
Startup delays breakdown
Delay (seconds)
Majors: (Gratuitous) ARPs + Scans
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
27
Where to next?
 Real-time system for automated detection
and evaluation of poor network
performance
 Identifies problem flows and isolates
potential causes of poor performance
 City-wide network monitoring
 Currently deployed in a Bay-area metropolitan
network
 Future: explore deployment and protocol
fixes
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
28
Q&A
Live traffic monitoring and more information at
http://sysnet.ucsd.edu/wireless/
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
29
Synchronization
 Create a virtual global clock
TSF diff of two sniffers
 If A and B are transmitting at
the same time they could
interfere
 If A starts transmitting after B
has started then A can’t hear B
 Require fine time-scales (1050us)
TSF diff (us)
 To keep unification working
 Critical evidence for analysis
 NTP is >100 usec accuracy
 802.11 HW clocks (TSF) have
100PPM stability
Time (s)
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
30
Trace unification (ideal)
Time
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
31
Trace unification (reality)
Jigsaw
unified
trace
JFrame 1
Time
JFrame 2
JFrame 3
JFrame 4
JFrame 5
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
32
Challenge: sync at large-scale
1
2
3
4
To
∆t1
∆t2
 How to bootstrap?

Goal: estimate the offset between TSF and the global clock
for each sniffer
 Time reference from one sniffer to the other
 Sync across channels
 Dual radios on same sniffer slaved to same clock
 Manage TSF clock skews
 Continuously re-adjust offsets when unifying
frames
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
33
Jigsaw syncs 99% frames < 10us
 Measure sync. quality
by max dispersion per
Jframe
 10 us is important
threshold
 802.11 back-off time is
20 us
 802.11 inter frame time
is 50 us
 Sufficient to infer many
802.11 events
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
34
Sensor pods
 Pod = pair of monitors
 Separated ~1 meter
 >35dB separation at 2.4Ghz
 Monitor = Soekris 4826-50





266Mhz 586 class CPU
128MB RAM, 64MB Flash
100Mbps Ethernet
Dual Atheros a/b/g radios
Power-over-Ethernet (semi-std)
 Jigdump software
 Captures/timestamps all 802.11
activity (including physical
errors)
 Stream back to centralized
server (>6TB storage)
7/18/2015
Yu-Chung Cheng/Qualcomm CR&D
35