Fault and Performance Diagnosis

Download Report

Transcript Fault and Performance Diagnosis

Wireless Mesh Network
Management - Fault and
Performance Diagnosis:
A Survey
Vijay P Gabale (CSE, IIT Bombay)
MTech Seminar under the guidance of
Prof. Bhaskaran Raman
Agenda
• Overview
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Wireless Network Woes!
• My machine says: wireless connection unavailable.
• Why is the network performance so low?
• Is someone interfering with my transmissions?
• Do we have complete coverage in all the buildings?
• I wonder if some one has sneakily installed an unauthorized
access point.
Wireless Network Anomalies
• RF holes
• Interference
• Hidden terminal
• Rogue Access Points
Which anomaly was the cause of undesired network
performance?
Challenges
• Quantification of possible causes
• Attribution of a performance problem to a specific root
cause i.e. recognizing a fault
• Network management to proactively deal with likely faults
• Avoiding personal visits to nodes in long distance links
Effects
• System
downtime
•Loss of
productivity
(loss of faith)
•Recovery cost
Number of wireless related complaints logged by
the IT department of a major US corporation
Source:[4]
Agenda
• Overview
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Enterprise Network
• Comprises of dense deployment of access points & clients in
a university or corporate building
• Challenges
• RF holes
• Interference
• Hidden terminals
• Rogue Access Points
• Solution space : Characterizing & then analyzing entire
wireless behaviour, Online and Offline diagnosis
Long-Distance Network
• Comprises of point to point links of several meters to build a
multi-hop mesh network
• Challenges
• Physical visits are costly
• Remote locations could sometimes become inaccessible
• Lack of trained personnel
• Poor power quality
• Solution space : In-node recovery or inference techniques,
Independent control mechanisms
Diagnostic Questions!
• What is the per packet signal strength at every node – RF holes
• How many concurrent receptions are there – Hidden Terminal
• How is the noise level varying over time – Interference
• Is there any foreign node wandering in the network – Rogue AP
• Is the remote node working? What is the software or hardware
status of the node? – Primary link failure or Software-Hardware
failures
Agenda
• Introduction
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Existing Techniques
• Offline data collection & analysis : [1], [6]
• Online anomaly detection : [2]
• Simulation : [3]
• Daemon running as a part of the node : [4]
• Software & Hardware redundancies : [5], [7]
Offline data collection & analysis
• Steps :
• Dense deployment of monitors
• Synchronization & unification at a central server
• Inference techniques
• Example : Jigsaw[1], MacWild[6]
• Fault Diagnosis :
• Pr (Interference | Concurrent Transmissions)
• Over-protective 802.11g clients and access points
Offline data collection & analysis - Framework
Central Server
source : MacWild[6]
Offline data collection & analysis
• Steps :
• Dense deployment of monitors
• Synchronization & unification at a central client
• Inference techniques
• Example : Jigsaw[1], MacWild[6]
• Fault Diagnosis :
• Pr (Loss due to Interference | Concurrent Transmissions)
• Over-protective 802.11g clients and access points
Online anomaly detection

Steps :

•
•


Deploy multiple monitors
Sample physical layer parameters
Dynamic interference engine
Example : Mojo[2]
Fault Diagnosis : Threshold for Hidden Terminal, Capture
Effect, Non 802.11 interference
Simulation

Steps :


•


Traces to drive simulation
Deviation of observed behavior from expected behavior
Decision trees to make distinction between possible faults
Example : Troubleshooting Wireless Mesh Networks[3]
Fault Diagnosis : External noise, Packet dropping,
Misbehaving clients
Simulation (contd…) Decision Tree
If simSent – realSent
> ThreshSentDiff
CW misuse
If simNoise – realNoise
> ThreshNoiseDiff
External Noise
If simLoss – realLoss
> ThreshLossDiff
Packet dropping
Normal
Daemon running as a part of the node
• An application resides at client side
• Takes reactive or proactive actions in response to an event
• Example : Client Conduit technique[4]
• Fault Detection : Rogue APs, RF holes
Software & Hardware redundancies
• Experiences of software & hardware failures
• Techniques :
• Software & hardware watchdogs
• Independent control mechanisms
• Tracking & predicting health of a node
• Example : Beyond Pilots[5], Fault Diagnosis[7]
• Fault Diagnosis : Erratic power conditions, Primary link
failure, Non 802.11 interference, Antenna misalignment
Agenda
• Introduction
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Problem : Intermittent Connectivity
• Symptoms : Irregular changes in connectivity or total
failure
• Causes : Weak RF signal, Lack of signal, unpredictable
ambiance, obstructions
• Parameter : Received signal strength
How to tackle total failure? How to track a mobile node?
Remedy : Client Conduit
• It is a mechanism to allow disconnected users to convey
messages to system and network administrators.
Problem : Rogue Access Point
• What is Rogue Access Point?
• Security holes, unwanted RF interference and network load.
• Access Point Database
• Location, MAC, Channel
Remedy : Client Conduit
Yes
Is MAC
Registered?
Is AP at
Expected Location?
No
No
Rogue AP
Detected
No
Yes
No
Is AP Advertising
Expected SSID?
Is AP on Expected
Channel?
Yes
Problem & Remedy : Hidden Terminal
• Symptoms : Degraded performance, lower throughput
• Causes : One transmitter not able to hear other
transmissions to the same receiver, heterogeneous transmit
powers
• Remedy : Quantify number of concurrent transmissions
• Around 40%
• Capture Effect : Around 5%
Problem & Remedy : Non 802.11 Interference
• Symptoms : Retransmissions at the MAC layer, No
concurrent transmissions detected
• Quantify noise level
• Moving window average
• Threshold
Problem : Connectivity problems over LongDistance Links
• Symptoms : Remote node NOT Reachable
• Causes: IP address misconfiguration, routing
misconfiguration, power shutdown at remote node, a board
failure, malfunctioning wireless card
• Solution :
• Link Local IP addressing
• SMS backchannel
Solution : Troubleshooting a Link
Does Link Local
IP Addressing Work?
Yes
Power
Unavailable
No
Power
available,
Send SMS query and
Get the result
Router is Up
Router Down
Log In & Fix
Configuration Problem
Wait for Power
Visit not required
Reboot or
Visit & Replace
Get Status Report:
Signal Strength, Noise
Visit may be Required
Problem & Remedy : Software and Hardware
Failures
•Symptoms : Node suddenly goes down, node does not
respond on trying to connect over the primary link
• Causes : Damaged power supplies or router boards,
damaged CF cards, low voltages leave router in wedged state,
battery problems
• Techniques : Software and hardware watchdogs, power
controllers, Low Voltage Disconnect, read only boot loader
Agenda
• Overview
• Motivation
• Enterprise vs. Long-Distance Networks
• Techniques
• Fault Diagnosis - Examples
• Future scope
• Conclusion
Future Scope
• Comprehensive Network Monitoring & Inference Tool
• Quantify Performance Improvement
• User Friendly GUIs
• Automatic Recovery
Conclusion
• Classification of Techniques to resolve fault diagnosis
• Enterprise as well as Long-Distance Mesh Networks
• Faults: Connectivity, Hidden Terminal, Interference,
Hardware Failures
• Need for ‘Complete Monitoring & Inference’ Suit to Detect
Root Level Causes
Appendix – Comparison Table
Appendix – Comparison Table (contd…)
References of the Survey
[1] Yu-Chung Cheng, John ellardo, and Peter Benko. Jigsaw:Solving the Puzzle of
Enterprise 802.11 Analysis. SIGCOMM’06.
[2] Anmol Sheth, Christian Doerr, Dirk Grun wald, Richard Han, and Dougla Sicker.
Mojo :a Distributed Physical Layer Anomaly Detection System for 802.11WLANS.
MOBISYS’06.
[3] Lili Qiu, Paramvir Bahl, Ananth Rao, and Lidong Zhou. Troubleshooting-Wireless
Mesh Networks. SIGCOMM’06.
[4] Atul Adya, Paramvir Bahl, Ranveer Chandra, and Lilli Qiu. Architecture and
techniques for diagnosing faults in ieee 802.11 infrastructure networks. MOBICOM’04.
[5] Sonesh Surana, Rabin Patra, Sergiu Nedevschi and Manuel Ramos. Beyond Pilots:
Keeping Rural Wireless Networks Alive. To appear in USENIX NSDI’08.
References of the Survey
[6] Ratul Mahajan, Maya Rodrig, David Wetherall, and John Zahorjan. Analyzing the
MAC Level Behavior of Wireless Networks in the Wild. SIGCOMM, 2006.
[7] Sonesh Surana, Rabin Patra, and Eric Brewer. Simplifying Fault Diagnosis in
Locally Managed Rural wifi networks. SIGCOMM NSDR, 2007.
[8] Yu-Chung Cheng, Mikhali Afanasyev, Patrick Verkaik, and Peter Benko.
Automating Cross-Layer Diagnosis of Enterprise Wireless Networks. SIGCOMM, 2007.
[9] Kameswari Chebrolu, Bhaskaran Raman, and Sayandeep Sen. Long-distance
802.11b Links: Performance Measurements and Experience. MOBICOM, 2006.