Interop_Theater_Root_Causx - Interop Conference Presentations

Download Report

Transcript Interop_Theater_Root_Causx - Interop Conference Presentations

Root-Cause Network Troubleshooting
Optimizing the Process
Tim Titus
CTO PathSolutions
1
Sample Problem
VoIP call quality problem
User complains about missing
parts of a conversation between
x41 and x53 at 12:04pm
x41
x52
x53
2
Packet Capture
Results
Latency: 127ms
Jitter: 87ms
Packet loss: 8.2%
Wireshark
x41
3
Actual VoIP Call
x52
You have confirmation
that there is a problem,
but no idea which device or link
caused the packet loss
x53
Application Performance Monitoring
Results
Latency: 127ms
Jitter: 87ms
Packet loss: 8.2%
x41
Synthetic VoIP Call
You have knowledge of the experience
across the network, but no understanding
of the source or cause of the problem.
4
x52
x53
Netflow Collectors
Flow Collector
Flow from
192.168.1.12 to
10.0.1.18 at 2:45pm
x41
Actual VoIP Call
You have knowledge of a flow across
the network, but no awareness of any
problem
5
x52
x53
SNMP Collectors
SNMP Collector
23% WAN utilization
at 12:05pm
x41
Actual VoIP Call
You have data about conditions on
some parts of the network,
but no analysis of the problem or
correlation to events
6
x52
x53
Finding the Root-Cause
x41
Actual VoIP Call
x52
Step 1:
Locate where the involved
endpoints connect to the network
x53
7
Finding the Root-Cause
x41
x52
Step 2:
Identify the full layer-2 path through
the network from the first phone to
the second phone
8
x53
Finding the Root-Cause
x41
x52
Step 3:
Investigate involved switch and
router health (CPU & Memory) for
acceptable levels
9
x53
Finding the Root-Cause
x41
TRANSIENT PROBLEM WARNING:
If the error condition is no longer
occurring when this investigation is
performed, you may not catch the
problem
Step 4:
Investigate involved interfaces for:
•
•
•
•
•
10
VLAN assignment
DiffServe/QoS tagging
Queuing configuration
802.1p Priority settings
Duplex mismatches
•
•
•
•
•
Cable faults
Half-duplex operation
Broadcast storms
Incorrect speed settings
Over-subscription
x52
x53
Optimizing the Methodology
In a perfect world you want:
• Tracking of:
 Every switch, router, and link in the entire infrastructure
 All error counters, performance and configuration info
• At any time of the day
• Automatic layer-1, 2, and 3 mapping from any IP to any IP
• Problems identified in plain-English for rapid remediation
This is what PathSolutions TotalView does
11
How TotalView Works
Install PathSolutions
All Switches and Routers are queried
for information
x41
x52
Result:
One location is able to monitor all
devices and links in the entire network
for performance and errors
x53
12
Total Network Visibility®
•
•
•
Broad: All ports on all routers & switches
Continuous: Health collected every 5 minutes
Deep: 18 different error counters collected and analyzed
•
Heuristics Engine provides plain-English prescription of
faults:
“This interface is dropping 8% of its packets due to
a cable fault”
13
Results Within 12 Minutes
Establish Baseline of Network Health
12% Loss from
Alignment
Errors
28% Loss from
Duplex
mismatch
x41
x52
7% Loss from
cabling fault
11% Loss from
Collisions
x53
14
Results Within 12 Minutes
Immediately start fixing problems
12% Loss from
Alignment
Errors
28% Loss from
Duplex
mismatch
x41
x52
7% Loss from
cabling fault
11% Loss from
Collisions
x53
15
Path Analysis Report
Root-cause troubleshoot all elements along a path
12:02pm
8% Loss from
Duplex
mismatch
x41
x52
x53
16
Don’t turtle your network
17
Total Network Visibility®
18