Network Operations and Management
Download
Report
Transcript Network Operations and Management
NOC Services and Applications
AFNOG 2002
Brian Longwe
*some slides based on the netmgt talks in NTW T2-99 by Abha Ahuja and NTW T4-98 by Scott Bradner
NOC Services and Applications
1
What is a NOC?
Network Operations Centre
Monitors and manages a service provider’s
network
• Fault monitoring and management
• Network status and operational statistics
• Information about current, historical and planned
availability of systems
• Engineers can coordinate their work through the
NOC
NOC Services and Applications
2
Network Management - What is it?
“In order operate a reliable service, the network must
be managed according to a determined discipline,
using a coherent structure of information
management.”
Geoff Huston, ISP Survival Guide
NOC Services and Applications
3
Network Management - Components
Parts of Network Management
•
•
•
•
•
Fault management
Configuration/Change management
Performance management
Security management
Accounting management
NOC Services and Applications
4
Fault Management
Identify the fault
• Regular polling of network elements
Isolate the fault
• Diagnosis of the network components
Respond to the fault
• Allocate resources to resolve the fault
• Priority scheduling
• Technical/management escalation
Resolve the fault
• notification
NOC Services and Applications
5
Fault Management - systems
reporting mechanism
• link to NOC
• notify on-call personnel
setup & control alarm procedures
repair/recovery procedures
ticket system
NOC Services and Applications
6
Fault Management - Fault Detection
Who notices a problem with the
network?
• Network Operations Center w/ 24x7 operations staff
– open trouble ticket to track problem
– preliminary troubleshooting
– Assign engineer to problem or escalate ticket status
• Customer call
• Other ISPs
NOC Services and Applications
7
Fault Management Fault Detection (con)
How can you tell if there is a problem with the
network?
• Network Monitoring Tools
– common utilities
ping
traceroute
Snmp
– Monitoring Systems
NOCol
Big Brother
NetSaint
NMIS
HP Openview, etc…
• Report state or unreachability
– detect node down
– routing problems
NOC Services and Applications
8
Exercise: Big Brother
Download Big Brother Source from http://t2-noc.ws.afnog.org/downloads.htm
Follow instructions on http://t2-noc.ws.afnog.org/bigbrother-setup-notes.txt
Set up bb-hosts to monitor:
80.248.70.x
tableA.t2.ws.afnog.org
80.248.72.192 t2-noc.ws.afnog.org # smtp ssh BBDISPLAY BBPAGER BBNET
80.248.72.254 noc.ws.afnog.org # smtp ssh http dns
page backbone Backbone Routers
group-compress <H3><I>Backbone Routers</I></H3>
80.248.72.254 gw-bb.ws.afnog.org # testip
80.248.72.251 t1-gw.ws.afnog.org # testip
80.248.72.252 t2-gw.ws.afnog.org # testip
80.248.72.253 t3-gw.ws.afnog.org # testip
page routers T2 Routers
group <H3><I>T2 Routers</I></H3>
80.248.70.1 Table A # testip noconn
80.248.70.2 Table B # testip noconn
80.248.70.3 Table C # testip noconn
etc...
NOC Services and Applications
9
Fault Management - Ticket System
Very Important!
Need mechanism to track:
• failures
• current status of outage
• carrier tickets
NOC Services and Applications
10
Fault Management:Ticket System
system provides for:
•
•
•
•
•
•
short term memory & communication
scheduling and work assignment
referrals and dispatching
oversight
statistical analysis
long term accountability
NOC Services and Applications
11
Fault Management - Ticket Usage
create a ticket on ALL calls
create a ticket on ALL problems
create a ticket for ALL scheduled events
copy of ticket mailed to reporter and mailing
list(s)
all milestones in resolution of problem maintain
the same ticket #
ticket stays "open" until problem resolved
according to problem reporter
NOC Services and Applications
12
Fault Management - Ticket Example
Sample opening ticket
Subject
Fix sshd on T1 instructor machines
Serial Number
6
Area
none
Queue
afnog-noc
Requestors
[email protected]
Owner
inst
Status
resolved
Last User Contact
Mon May 7 17:02:21 2001 (30 hr ago)
Current Priority
1
Final Priority
1
Due
No date assigned
Last Action
Mon May 7 17:02:21 2001 (30 hr ago)
Created
Sat May 5 17:08:08 2001 (3 day ago)
NOC Services and Applications
13
Fault Management - Ticket Example
Sample progress ticket
TT0000033975 has been MODIFIED. Here are the fields that have been changed:
CopyOfTime
:5
TTC Temp
:0
Ticket information log : [email protected] said ...
While I was investigating this, Debbie from UUNet called (via Merit main
number) to tell us they were seeing it down. She can be reached at
xxx-xxxx. The UUNet ticket is xxxxx..
NOC Services and Applications
14
Fault Management - Ticket Example
Sample closing ticket
• includes previous ticket contents plus resolution
Users on the laptop station minihub are not getting correct DHCP responses. No
gateway or DNS entries are returned.
Thanks, - Hervey
-- CUSTOMER INFORMATION --------------------'inst' (AFNOG Instructors) –
-------------------------------------
There have been several issues. First, the Cisco config-switch was set so the box
would forget it's config on a power cycle (and we've had a few). Second, I made a typo
when I cleaned up a DNS file. Things *should* be working now (famous last words).
Resolving this till I hear otherwise.
GJ
--------------------------------------------------------------->otherwise.
>GJ
Many thanks!
- Hervey
NOC Services and Applications
15
Exercise: Ticket System
•Download WebTTS Source from http://t2-noc.ws.afnog.org/downloads.htm
•Follow instructions on http://t2-noc.ws.afnog.org/webtts-setup-notes.txt
•Create 2-3 users within ticket system
•Create tickets to track network occurrences as they occur - network failures will
be provided ;-)
NOC Services and Applications
16
Fault Management - typical failures
• Node unpingable
• no ip connectivity to router
• possible reasons:
– serial link down
call telco
– router down/hardware problem
call engineer
– routing problem
troubleshoot with traceroute
routeviews machine
NOC Services and Applications
17
Performance Management
A Consistent level of network performance
Data collection
– interface stats
– throughput
– error rates
– usage
– percent availability
Data analysis for performance metrics and trends
Establishment of performance thresholds
Capacity planning and deployment
NOC Services and Applications
18
Importance of Network Statistics
Accounting
Troubleshooting
Long-term trend analysis
Capacity Planning
Two different types
• active measurement
• passive measurement
Management Tools have statistical functionality
NOC Services and Applications
19
Performance Management Tools
netflow
•
•
•
•
•
•
•
cflowd (http://www.caida.org/Tools/Cflowd)
collects flow information from cisco routers
AS to AS information
src and destination ip and port information
useful for accounting and statistics
how much of my traffic is port 80?
how much of my traffic goes to AS237?
NOC Services and Applications
20
Netflow examples
Top ten lists (or top five)
##### Top 5 AS's based on number of bytes #######
srcAS dstAS
pkts
bytes
6461 237
4473872
3808572766
237 237
22977795
3180337999
3549 237
6457673
2816009078
2548 237
5215912
2457515319
##### Top 5 Nets based on number of bytes ######
Net Matrix
---------number of net entries: 931777
SRCNET/MASK DSTNET/MASK
PKTS
165.123.0.0/16 35.8.0.0/13
745858
207.126.96.0/19 198.108.98.0/24
708205
206.183.224.0/19 198.108.16.0/22
740218
35.8.0.0/13 128.32.0.0/16
671980
##### Top 10 Ports #######
input
port
packets
bytes
119
10863322 2808194019
80
36073210
862839291
20
1079075 1100961902
7648
1146864
419882753
25
1532439
97294492
BYTES
1036296098
907577874
861538792
467274801
output
packets
bytes
5712783
427304556
17312202 1387817094
614910
62754268
1147081
414663212
2158042
722584770
NOC Services and Applications
21
Exercise: Cricket
Load Track 2 Cricket Page from http://t2-noc.ws.afnog.org/downloads.htm
Observe the various characteristics that are being monitored by the system.
NOC Services and Applications
22
Security Management: Do’s & Don’t’s
Dont’ leave things that are likely to be interesting to mice lying
on the kitchen table overnight
Plug the holes that mice are using to get into the house
Don’t provide places within the house for mice to build nests
Set traps along walls where you often see mice out of the corner
of your eye
Check the traps daily to rebait them and to dispose of squashed
mice. Full traps don’t catch mice, and they smell
Avoid using commercial bait-and-kill poisons. Traditional snap
traps are best.
Get a cat!
NOC Services and Applications
23
Security Management - Tools
security tools
•
•
•
•
•
cops - host configuration checker (www.cert.org)
swatch - email reports of activity on machine
Tcpwrappers – log connections, restrict access
ssh/skey – crypto authentication and communications
Tripwire – monitor changes to system files
Keep up to date with security information
• bug reports
– CERT advisories mailing list:
http://www.cert.org./contact_cert/certmaillist.html
• bug fixes
• intruder alerts
NOC Services and Applications
24
Security Management – Good Practice
reporting procedure for security events
• e.g. break-ins
• abuse email address for customers to report
complaints ([email protected])
control internal and external gateways
• control firewalls (external and internal)
security log management
• centralised logging host
NOC Services and Applications
25
Configuration Management
Maintaining information relating to the design of the
network and its current configuration
Monitor
Network State
• Record of network topology
– Static
what is deployed
where it is deployed
how it is attached
– Dynamic
operational status of the network elements
NOC Services and Applications
26
Configuration Management
SNMP driven display
wjh12
mghgw
generali
husc6
harvard
talcott
wjhgw1
harvisr
huelings
geo
pitirium
nnhvd
nngw
oitgw1
sphgw1
lmagw1
dfch
NOC Services and Applications
tch
tch
27
Configuration Management
Operational Control of network
Start/stop individual components
Alter configuration of devices
Load and save config versions
Hardware/Software upgrades
Methods of access
• SNMPGet / SNMPSet
• Out-of-Band access
NOC Services and Applications
28
Configuration Management
inventory management
• database of network elements
• history of changes & problems
directory maintenance
• all hosts & applications
• nameserver database
host and service naming coordination
• "Information is not information if you can't find it"
NOC Services and Applications
29
What is SNMP?
Simple Network Management Protocol
query - response system
• can obtain status from a device
• standard queries
• enterprise specific
uses database defined in MIB
• management information base
NOC Services and Applications
30
What do we use SNMP for?
query routers for:
•
•
•
•
in and out bytes per second
CPU load
uptime
BGP peer session status
query hosts for:
•
•
•
•
network status
Message queues
Web traffic
Squid proxy load
NOC Services and Applications
31
SNMP Network Management Tools
MRTG http://www.ee-staff.ethz.ch/~oetiker/webtools/mrtg/
RRDtool http://ee-staff.ethz.ch/~oetiker/webtools/rrdtool/
Cricket http://cricket.sourceforge.net/
HP OPenview
Benefits
– simple to use and configure
– quickly determine spikes/drops in traffic
– Can display almost any data that can be collected via
SNMP
NOC Services and Applications
32
MRTG
Traffic Analysis for Hssi1/0/0
System:
msu.mich.net in
Maintainer:
Interface:
Hssi1/0/ 0 (2)
IP:
hssi1-0-0.msu.mich.net (198.108.22.102)
Max Speed:
5630.6 kBytes/s (propPointToPointSerial)
NOC Services and Applications
33
Accounting Management
What do you account for?
• Use of the network and the services it provides
Types of accounting data
• RADIUS/TACACS accounting data from Access
servers
• Interface statistics
• Protocol statistics
Accounting Data affects Business Models
• Bill on usage?
• Flat-rate billing?
NOC Services and Applications
34
NOC Practical
network monitor - NOCOL
Observe network status
• Create a “problem”
• Observe change in status
• “resolve” the problem
Statistics?
NOC Services and Applications
35
NOC Practical
Ticket System - WebRT
• Overview
• Create tickets
– As customer
– As engineer
• Review tickets as engineer
• Take/Assign tickets
NOC Services and Applications
36
Exercises
Rows A to I become the NOC
Rows B to J become the customers
Customers send in fault notifications,
automatically creating tickets
Engineers take/give tickets and resolve or
escalate
Changeover … repeat
<during this, there are network failures that must
be detected and fixed>
NOC Services and Applications
37
Exercise
Customers
B
D
F
H
J
•Create tickets by sending in email to
[email protected]
•Receive updates on progress of ticket
status
•Receive notice that ticket has been
closed when resolution is complete
B
Ticket Flow
NOC
A
First Level
C
G
E
2nd
Tier: Monitoring,
I
•Use Ticket System web interface
http://noc.ws.afnog.org/cgibin/webrt.cgi
•Assign tickets
•Update tickets
•Escalate tickets
•Resolve tickets
NOC Services and Applications
38
How do I manage my network?
Which tools should I use? What do I really
need?
• Keep it simple!
• Need to consider engineers working remotely
• Don’t want to spend too much time maintaining the
tool (it should be helping you!)
• Different tools for NOC and engineers
• Different tools for statistics
• RELIABILITY!
NOC Services and Applications
39
References
http://www.merit.edu/ipma/docs/isp.html
http://www.nanog.org
http://www.caida.org
http://www.nlanr.net
http://www.cisco.com
http://www.amazing.com/internet/
http://www.isp-resource.com/
http://www.merit.edu/ipma
http://www.ripe.net
NOC Services and Applications
40
More Tools!
http://www.caida.org/Tools/
• OC3Mon/Coral
http://www.merit.edu/~ipma
• RouteTracker
• IRRj
• ASExplorer
http://www.geektools.com/
http://www.merit.edu/ipma/tools/other.html
NOC Services and Applications
41
ASexplorer
NOC Services and Applications
42
Route Flap Stats
NOC Services and Applications
43
Looking Glass Tools
http://www.merit.edu/~ipma/tools/lookingglass.h
tml
route-views.oregon-ix.net>show ip bgp 35.0.0.0
BGP routing table entry for 35.0.0.0/8, version 56135569
Paths: (17 available, best #12)
11537 237
198.32.8.252 from 198.32.8.252
Origin incomplete, localpref 100, valid, external
Community: 11537:900 11537:950
2914 5696 237
129.250.0.3 (inaccessible) from 129.250.0.3
Origin IGP, metric 0, localpref 100, valid, external
Community: 2914:420
2914 5696 237
129.250.0.1 (inaccessible) from 129.250.0.1
Origin IGP, metric 0, localpref 100, valid, external
Community: 2914:420
3561 237 237 237
204.70.4.89 from 204.70.4.89
Origin IGP, localpref 100, valid, external
267 1225 237
204.42.253.253 from 204.42.253.253
Origin IGP, localpref 100, valid, external
Community: 267:1225 1225:237
NOC Services and Applications
44
More Looking Glass Tools
Traceroute servers
http://www.merit.edu/ipma/tools/trace.html
Query: trace
Addr: www.isoc.org
Translating "www.isoc.org"...domain server (206.205.242.132) [OK]
Type escape sequence to abort.
Tracing the route to info.isoc.org (198.6.250.9)
1
2
3
4
5
6
7
8
9
iad1-core2-fa5-0-0.atlas.digex.net (165.117.129.2) 0 msec 0 msec 4 msec
dca5-core2-s5-0-0.atlas.digex.net (165.117.53.41) 0 msec 4 msec 0 msec
dca5-core1-fa5-1-0.atlas.digex.net (165.117.56.117) 4 msec 0 msec 4 msec
Hssi3-1-0.BR1.DCA1.ALTER.NET (209.116.159.98) 0 msec 0 msec 4 msec
101.ATM2-0.XR1.DCA1.ALTER.NET (146.188.160.226) [AS 701] 4 msec 0 msec 4 msec
195.ATM7-0.XR1.TCO1.ALTER.NET (146.188.160.102) [AS 701] 4 msec 0 msec 0 msec
193.ATM8-0-0.GW1.TCO1.ALTER.NET (146.188.160.33) [AS 701] 4 msec 4 msec 4 msec
charlie.isoc.org (198.6.250.1) [AS 701] 8 msec 8 msec 8 msec
info.isoc.org (198.6.250.9) [AS 701] 8 msec * 12 msec
NOC Services and Applications
45
SNMP Tool references
•
•
•
•
•
•
MON - http://www.kernel.org/software/mon/
NOCol - ftp://ftp.navya.com/pub/vikas/nocol.tar.gz
Sysmon - ftp://puck.nether.net/pub/jared
Rover - http://www.merit.edu/~rover
Concord - http://www.concord.com
http://www.merit.net/~netscarf
NOC Services and Applications
46