Break-2046- Finding the Root Causes of Network Anomalies

Download Report

Transcript Break-2046- Finding the Root Causes of Network Anomalies

Break-2046
Finding the Root Causes of Network
Anomalies
Best Practices in Problem Solving
Agenda
•
•
•
•
•
Troubleshooting Methodologies
Information Collection Tools
A Sample Problem: Bad VoIP Call
Optimizing the Methodology
Total Network Visibility
Troubleshooting Methodologies
What graduates a junior level
Engineer to a senior level
Engineer is their
troubleshooting methodology
Bad Methodology
“Do something to try to fix the problem”
•
•
•
•
Reboot the server
Change the network settings
Replace hardware
Re-install the OS
Bad Methodology
Why this doesn’t work:
• The problem might be made worse
• The root-cause of the problem might be
masked by the “fix”, making it harder to solve
• If the problem goes away, you’ll never know
what actually caused the problem. This
means that the problem is only going to
happen again.
Good Methodology
Collect Information
Create Hypothesis
Test Hypothesis
Implement Fix
Document Fix
Undo changes
Collect Information
Sample User collected Information:
• Occurrence: “Is this the first time this has happened?”
• Frequency: “How often does this problem happen?”
• Scope: “Does this affect other users in your area?”
• Change (local): “Did anything change on your PC?”
Collect Information
Sample Engineer collected Information:
• Change (global): “Did anything change on the network?”
• Reachability: “Does the device respond to a PING?”
• Errors: “Are we dropping packets in the network?”
• Loading: “Are the network links performing well?”
Create Hypothesis
• Analyze all collected facts
• Collect additional facts if required
• Avoid politics
Test Hypothesis
• Insure that testing is nonintrusive
Don’t create additional problems with testing
methodology
• Determine what comprises a successful test
Implement Fix
• Schedule fix to be implemented with users
(if required)
• Have a back-out plan if the fix involves critical
systems
Document Fix
Good documentation makes your life easier:
– Prevents 2am wake-up calls from other engineers
trying to solve the same problem
– You don’t have to memorize every fix
– Other team members & management start to
realize your value when you share your
knowledge
Information Collection Tools
•
•
•
•
Network Performance Monitoring solutions
Network Instrumentation & Testing solutions
Application Performance Monitoring solutions
Packet Analysis solutions
Using the wrong tool
can be misleading
and slow down the
resolution
Sample problem: VoIP Call
Using a packet analyzer to solve a call quality problem
Results of VoIP Call
Latency: 127ms
Jitter: 87ms
Packet loss: 8.2%
Packet
Analyzer
x51
A
G
x41
B
D
F
H
E
x43
Actual VoIP Call
You have confirmation
that there is a problem,
but no idea which device or link
caused the packet loss
x52
I
C
x42
x53
Sample problem: VoIP Call
Solving the root-cause of the problem
x51
A
B
G
x41
Poor Quality VoIP Call
D
F
H
E
x52
x43
I
C
x42
Step 1:
Identify the involved endpoints
and where they are connected
into the network
x53
Sample problem: VoIP Call
Solving the root-cause of the problem
x51
A
G
x41
B
D
F
H
E
x52
x43
I
C
x42
Step 2:
Identify the full layer-2 path
through the network from the
first phone to the second phone
x53
Sample problem: VoIP Call
Solving the root-cause of the problem
x51
A
G
x41
B
D
F
H
E
x52
x43
I
C
x42
Step 3:
Investigate involved switch and
router health (CPU & Memory)
for acceptable levels
x53
Sample problem: VoIP Call
Solving the root-cause of the problem
TRANSIENT PROBLEM WARNING:
If the error condition is no longer
occurring when this investigation
is performed, you may not catch
the problem
x51
A
G
x41
B
D
F
H
E
x43
Step 4:
Investigate involved interfaces for:
•
•
•
•
•
VLAN assignment
DiffServe/QoS tagging
Queuing configuration
802.1p Priority settings
Duplex mismatches
•
•
•
•
•
x52
I
C
x42
Cable faults
Half-duplex operation
Broadcast storms
Incorrect speed settings
Over-subscription
x53
Optimizing the Methodology
In a perfect world, you want:
• Monitoring of every switch, router, and link in the
entire infrastructure
• Monitoring for both configuration and performance
• Automatic Layer-2 Mapping from any IP endpoint to
any other IP endpoint
• Ability to “Dig Deep” and look at numerous error
counters
• Problems identified in Plain-English for rapid
remediation
This is what PathSolutions does
How PathSolutions Works
Automated Network Intelligence Gathering
All Switches and Routers are
Install PathSolutions
automatically queried for information
A
B
x51
G
x41
F
D
H
E
x52
x43
I
C
x42
Result:
One location is able to monitor all links
in the entire network for performance
and errors
x53
Total Network Visibility™
• Broad: All ports on all routers & switches
• Deep: 18 different error counters collected
and analyzed
• Network Prescription engine provides plainEnglish descriptions of errors:
“This interface has a duplex mismatch, and this is the fullduplex side of that duplex mismatch”
Path Analysis Report
7:56am
18% Loss from
Duplex mismatch
12:02pm
12% Loss from
Cabling Fault
x51
A
G
x41
B
F
D
H
E
x52
x43
I
C
x42
11:32am
100% Transmit utilization
15% Loss from discards
Latency & Jitter penalty incurred
x53