WAN - Fluke Networks
Download
Report
Transcript WAN - Fluke Networks
© BV
Fluke Networks
NETWORK
SUPER
VISION
© BV
How We Got Here! (A Brief History)
Focus on Value, Quality, and Reliability
Drawing on 50 year Fluke heritage of most reliable tools
available
Innovation
1993
Handheld
Network
Analyzer
“Firsts” in most categories entered
1995
Digital
Cable
Tester
1996
Handheld
Fast
Ethernet
Analyzer
1998
Out of
the Box
Network
Mgmt
1999
Digital Cat
6 Cable
Tester
2000
PC Support/
Help Desk
Test Tool
2000
Full line-rate
Gigabit
Analysis
2000
Portable &
Distributed
Protocol
Analysis
2000
Integrated
Network
Analyzer
2002
WorkGroup
Analyzer
2002
Handheld
Wireless
Analyzer
2003
WAN OC3 / 12
Analyzer
© BV
Fluke Networks Today
Annual Sales Over $150M
Over 500 Employees Worldwide
Direct Sales, Support, and Service in 22 Countries
47% of Sales Outside the US
91 of the Fortune™ 100 use our Solutions
Over 100,000 Network Testers Shipped To Date
© BV
Fluke Networks Products
Handheld
Network Testers
Cable and Fiber
Testing
Turn your staff into
powerful problem
solvers
Verification and Troubleshooting of Cable Plant
Network Analysis
Portable and Distributed
Solutions for Optimization
and Troubleshooting
© BV
What is Network Availability?
The term Network Availability encompasses several
aspects of providing information services.
Estimates how much downtime you are going to have
over a period of time.
Network Device vs. LAN/WAN Availability
High Availability - Highly reliable networks used for
emergency services
© BV
Fundamentals of Network Availability
To calculate Network Availability use the
following variables and equation
MTBF (Mean Time Between Failures)
MTTR (Mean Time To Repair)
The Equation
Availability=
MTBF
MTBF + MTTR
Now lets look at these in more detail…
© BV
Fundamentals of Network Availability: MTBF
MTBF = Mean Time Between Failures
It describes the number of hours between failures for a specific
device
In order to calculate MTBF you have to calculate the components
of the device
Manufacturers typically publish MTBF numbers internally and
make them available to their customers
MTBF numbers for a device will include both hardware and
software, although each are calculated differently
© BV
Fundamentals of Network Availability: MTTR
MTTR = Mean Time To Repair
It describes the amount of time in hours (on average) that elapses
between a network failing and the network being restored
There are three phases to fixing a network problem
»
Detection: time to notification of network failure
»
Diagnosis: time to discover the root cause of the failure
»
Repair: time to correct the root cause and verify fix
© BV
The Availability Equation
An example: all units in minutes
MTBF = 473,364 MTTR = 52,596
473,364
473,364 + 52,596 = .90000
.90000 x 100 = 90%
So the network has 90% availability or 1 ‘9’ of availability
Use the chart on the next slide to see how adding 9’s
changes the availability of a network
© BV
Availability and the number of 9’s
Number of
Nines
Availability
Percentage
Min of Uptime
per year
Min of
Downtime per
year
Annual
Downtime
1
90.000%
473,364
52,596
36.5 days
2
99.000%
520,700.4
5259.6
3.5 days
3
99.900%
525,434
525.96
8.5 hours
4
99.990%
525,907.4
52.596
1 hour
5
99.999%
525,954.7
5.2596
5 minutes
6
99.9999%
525,959.5
.52596
32 seconds
© BV
Availability and the number of 9’s
As you saw in the previous table for each 9 a significant
increase in uptime is achieved
It is said that after the 2nd 9, each additional 9 cost twice
as much.
So while each 9 cost double the previous 9, the gain is 10
times more availability
In order to realize gains in availability we have to modify
the only variable within our direct control…
Minimize MTTR!
© BV
Strategy for Minimizing MTTR
Reduce the time to detection
Monitor your network
Use notifications
Reduce the time to diagnose
Document the network (Automation is quicker and more efficient)
Use effective real-time troubleshooting tools
Train your staff and/or implement some form of knowledgebase
Reduce the time to repair and verify
Have spares and extra parts available
Use effective tools to verify resolution
© BV
Challenges We Face Reducing MTTR!
Why is the network slow? (root cause)
Is it the network, the application, the
configuration?
Switches/VLAN’s
How do you “see” inside switched
networks?
WAN
Voice & Video over IP
How do you fix QoS issues?
WAN
Need to verify Service level agreements?
Wireless
Need to maintain security and verify connectivity?
© BV
What Have We Used In The Past?
Network Management Systems are
“Component Management / Monitoring
Tools”
(not very flexible, but still necessary)
Traditional Toolkit
Protocol Analyzers; are they limited
today?
Collision domain is reduced to a single
station in a switched environment! (no
unicast traffic can be seen on the network)
Monitoring network traffic is only possible
at one point with one analyzer!
Full-Duplex Gigabit Analysis – with a PC?
Is this really enough to analyze a modern network?
© BV
Possibilities with a traditional “Toolkit”
Network Management System
It will alert me if something goes wrong.
For example, if a Router Interface goes
down or if a server is not reachable from the
NMS.
WAN
Generally speaking, this is pretty useful!
But, is the NMS always telling the whole
truth?
© BV
Possibilities with a traditional “Toolkit”
Does the NMS really show all of
our problems?
What can it show us when users
complain about a slow network?
What happens if the whole
screen goes red?
Where is the root cause?
What could it be?
And how do you find it?
© BV
Possibilities with a traditional “Toolkit”
Protocol Analyzer
Where should we install it?
WAN
© BV
Possibilities with a traditional “Toolkit”`
Protocol Analyzer
Where should we install it?
What will we see when we connect it
to a Switch?
Will it be able to show us why the
“network is slow”?
WAN
If the root cause is an application problem and
we did know exactly where to plug in, probably
yes!
But what about network problems?
And what about configuration problems?
© BV
Network Analysis Techniques, fill the Gap!
Network Management (OpenView, Spectrum, VitalSuite, etc.)
Management of Enterprise Networks
Component level, manually configured -> not very flexible, “Trap collectors”
Network Monitoring / Documenting (RMON, RMON2, MIBII)
Statistical data from SW and HW Agents, Switches, Routers,
Servers, and Probes (LAN and WAN) over time.
Essential for trending, alarms, reporting, documenting
Network Analysis (Real Time Information, Active Discovery)
SNMP Provides inventory, mapping, path analysis, real-time
stats
Essential for visibility into switched and routed Networks,
accounting and security
Packet Capture (sniffers)
Provides capture, decode, and expert analysis of trace files
Essential for visibility into application problems
The combination and integration of these four network analysis techniques
makes the OptiView Network Analysis Solution uniquely powerful!
© BV
The Modern “Toolkit”, Fluke Networks
Network Inspector Console
Network Inspector Agent
Local or distributed
Handheld Field Tools (portable)
OneTouch, NetTool
LinkRunner
WaveRunner (Wireless)
OptiView Protocol Expert
(portable or distributed)
Link Analyzer (distributed)
Integrated & Wireless
Network Analyzer (portable)
Workgroup Analyzer, WGA
Analyzer (distributed)
OC3/OC12, WAN Analyzer
(distributed)
WGA distributed
WGA distributed
INA portable
© BV
Case Studies for Minimizing MTTR
Lets take a look at some case studies that show how the
Fluke Networks Distributed Analysis Solution can help
reduce MTTR
Slow response
Print failure
VoIP Problems
WAN based DOS (Denial of Service) Attack
Bandwidth Hog
© BV
Problem #1
A user is complaining of very slow
response times from an Application
Server.
© BV
Why is response time slow?
Is it the network, the application, the
configuration?
WAN
A client is complaining of a slow response
from an application server! I bet you’ve
never heard that one before!
© BV
Network Inspector Console
(somewhere in the network)
Network Inspector Agents
(1 Agent per VLAN/Subnet)
WAN
A client has slow response from an
application server!
© BV
Agent in the same VLAN / Subnet
as the problematic client.
What else is in this VLAN / Subnet?
What is the load on the uplink to the
switch? (Green -> no overload)
No Entry for the client or the server
in the problem log -> there is no L3
configuration problem.
© BV
© BV
© BV
© BV
So, let’s have a look at the switch
port where the server is
connected first...
Unfortunately the port where the
client is connected, does not have
trending turned on...
© BV
No unusual utilization trend in
the past hour (compare to the
previously recorded data)...
No lower layer errors from the
network side.
© BV
The root cause of the problem does not
seem to be the network on the server side
(up to layer 3)
What else could it be?
WAN
A client has slow response from an
application server!
© BV
An OptiView Integrated Network
Analyzer can be anywhere on the
network!
WAN
A client has slow response from an
application server!
© BV
An OptiView Integrated Network
Analyzer can be anywhere on the
network! (portable)
The same goes for OptiView
Workgroup Analyzers! (distributed)
WAN
Let’s have a look at the switch that the
problematic client is connected to, in
real time, with any of the Analyzers.
A client has slow response from an
application server!
© BV
Remember from Console our client is
connected to Slot 1 Port 5...
© BV
High utilization doesn’t seem to be
the root cause of the problem in this
case...
© BV
No errors on this Interface since the
INA is running...
There are quite a few collisions on
this port though???
© BV
Why collisions? There is only one
device connected to this switch
port!
Ah, the switch port has
autonegotiated down to 10Mb
half duplex with the client PC!
© BV
Problem #2
A new Novell Server has just been installed
and mapped to a printer connected to a Sun
Server. Each time a print job is sent to the
printer, it fails! Surprisingly enough the
Novell people are blaming Sun & vice versa!
© BV
Novell Server
Sun Server
Ethernet Switch
Network
Printer
© BV
© BV
© BV
Double Click on printer to get more
detail about configuration!
Details include: IP, Subnet Mask, MAC
address, IPX Name, IPX addresses. It
also includes the switch port that the
device is plugged into!
WebNetSwitch Port:9
© BV
WAN
© BV
Before we go to the switch stats
let’s take a look at our real-time
problem log!
© BV
Now let’s go to the switch where the
printer is connected!
There isn’t an entry for any of the
devices in question!
© BV
There has been no excessive
utilization or abnormal traffic types
on the switch port!
© BV
What about any errors on the port?
© BV
There have been no errors
reported on this switch port!
© BV
Could it be time to use our protocol
analysis tools to solve this problem?
YES!!
© BV
Let’s go to the capture menu to build
a customized capture filter!
© BV
Let’s build a capture filter for all
traffic to & from the printer in
question!
After building the capture filter,
Simply we hit start capture & then
re-send the print job!
© BV
After capturing the print job
press “Stop Capture” and
then “View Capture”!
© BV
The print job is decoded and
presented to us in our
Protocol Expert Application!
© BV
Let’s go to our “Expert”
system and see what it has
to say!
© BV
All symptoms seem to be the same
“TCP Fast Retransmission”. Let’s
double click on the symptom and
have a closer look!
My “Expert” system has
logged 7 Transport Layer
issues
© BV
Let’s right click on the Reference
Frame that is responsible for the ReTransmitts and go back to our decode
view!
© BV
In frame 114 we notice that the
“TCP” checksum is incorrect &
therefore we don’t get an ACK
© BV
Conclusion:
Upon closer review we notice that this same
exact incorrect TCP checksum shows up over
and over causing the other side to keep
retransmitting (creating an endless cycle)
We conclude that the TCP stack for the Novell
Server is broken
Using the same trace file how do we prove that
it’s not a switch or network problem?
© BV
The FCS for the Ethernet Frame
itself is fine!
© BV
Let’s Review
Viewed details about printer configuration!
Configured SPAN port for packet capture!
Captured print job!
Viewed trace and resolved problem!
Accomplished all of this from Console without leaving my seat!
Reduced MTTR!
© BV
Problem #3
A large banking customer that supports
VoIP for multiple remote sites is receiving
user complaints of dropped calls and calls
with only one-sided audio?
© BV
Link Analyzer &
Multi-Port Tap
Call Manager
Cluster
IP WAN
PSTN
© BV
From the decode view hit the “Q” button to
drill into the VoIP QOS information!
© BV
Hit the “All Calls” tab to view all calls
both completed and still active!
© BV
The “User R Factor” tab gives us
an instant QOS measurement for
all calls that were captured!
© BV
The “Jitter” tab gives us the
# of calls that exceeded
50ms of jitter!
© BV
The “Dropped Packets” tab
gives us an instant view of
all calls that experienced
dropped packets!
© BV
The “Setup Time” view gives
us the average time it took for
all calls to establish their RTP
streams!
© BV
Double click on any individual
call to get details about that
particular call!
© BV
Call detail information on an
individual call!
© BV
Where’s the Problem?
Is it the network?
No: Distributed analysis
verifies that the links are
clean and utilization is
not an issue.
Is it the application or
configuration?
Yes: Ongoing analysis
clearly demonstrates that
VoIP QoS degrades
during periods of high
call volume.
© BV
Conclusion:
The Call Manager cluster seemed to be
choking when supporting more than 100
simultaneous calls from remote sites. Long
term solution was to use a distributed Call
Manger Cluster architecture. The short-term
fix was to install a Cache Engine to assist
with traffic flow.
© BV
Problem #4
Users at a remote site are complaining
that a particular server is unavailable
© BV
WAN
© BV
No Physical Layer Errors!
WOW! Look at how much
telnet is being seen!
No Problems!
© BV
Make sure “All VC’s” is selected so we
do not exclude any traffic sources!
Highlight onto Telnet Protocol and
then drill into Top Conversations to
see the top contributor!
© BV
Look at the strange IP addresses,
all in incremental order!
Let’s capture on any telnet to this
server and gather some evidence!
© BV
Setup a telnet filter for “Any” IP address
to/from our Server 192.168.101.21, then
start your capture!
Then hit “View Capture”
© BV
The trace file shows us lots of TCP connection
attempts with no acknowledgement to complete the
TCP 3-way handshake. Also note that the source port
from these questionable IP’s looks fishy!
© BV
Conclusion:
A hacker has found an open port on your
server and is launching a denial of service
SYN attack.
Close down the open TCP port and/or block
that IP address range from accessing your
server. Contact your ISP if applicable.
© BV
Problem #5
Utilization on a WAN link is higher than
normal. Have we recently introduced a
new business application or is this
traffic recreational?
© BV
WAN
© BV
Note that “All VC’s” is selected!
Utilization is unusually high
© BV
Streaming Video seems to be our top
application (not a business app)
Let’s find the culprits!
© BV
Let’s drill into one of these clients and
see which VC they are utilizing?
Looks like an Internet Server
streaming to several users on
the 192.168.105 network!
© BV
This host is on the VC to Brazil. This checks
out as we know that the 192.168.105 Network
is in our Brazil office
© BV
Conclusion:
Users are using an unauthorized, highbandwidth application. After viewing a trace
file of conversations it appears that the
“World Cup” soccer match is being web cast
on the internet.
Users should be informed to cease the
application or the port number for MS
streaming video can be blocked on our
firewall
© BV
From Protocol Analysis to Integrated
Network and Service Management Solutions
Understanding the power of a complete solution!
© BV
OptiView Network Analysis Solution
OptiView: Inspector Console
OptiView: Integrated Network
Analyzer (INA)
OptiView: WorkGroup Analyzer
(WGA)
OptiView: Protocol Expert
OptiView: Link Analyzer (LA)
OptiView: WAN Analyzer
© BV
Integration With Your Systems
• Use existing data sources
– SNMP Agents
– RMON Agents
– Protocol Analyzers
• Provide data to existing systems
– Network Management
– RMON Consoles
– Protocol Analyzers
© BV
What’s New & What’s Coming in 2003/2004
Optifiber (Certifying OTDR) All in one
fiber Certification/Troubleshooting
platform. Available Now!
Sonet OC3/12 WAN Analyzer (supports
ATM, POS, & MPLS) Available Now!
T1 / V.35 WAN Analyzer (supports Frame
Relay, PPP, & Cisco’s HDLC) Coming
Soon!
DS3 / HSSI WAN Analyzer (supports
Frame Relay, PPP, & Cisco’s HDLC)
Coming Soon!