performance management

Download Report

Transcript performance management

Network Management: Accounting and Performance Strategies - Graphically Rich Book
Network Management: Accounting and Performance Strategies
by Benoit Claise - CCIE No. 2686; Ralf Wolter
Publisher: Cisco Press
Pub Date: June 20, 2007
Print ISBN-10: 1-58705-198-2
Print ISBN-13: 978-1-58705-198-2
Pages: 672
PERFORMANCE MANAGEMENT
Understanding the need for Performance
Management
What is performance management?
Why do networks require performance management?
Which problems does performance management solutions solve?
What aspects make up performance monitoring (data collection, data
analysis, reporting, billing, and so on)?
Defining performance
management
ITU-T definition (M.3400 and X.700, Definitions
of the OSI Network Management Responsibilities):
• Performance Management provides functions
to evaluate and report upon the behavior of
telecommunication equipment and the
effectiveness of the network or network
element.
• Its role is to gather and analyze statistical
data for the purpose of monitoring and
correcting the behavior and effectiveness of
the network, network elements, or other
equipment and to aid in planning,
provisioning, maintenance and the
measurement of quality.
ITU-T definition (M.3400 and X.700, Definitions
of the OSI Network Management Responsibilities):
Performance
management includes
functions to:
Performance
management includes
functions to:
gather
statistical information
maintain
and
examine
logs of system state histories
determine
system performance under natural and
artificial conditions
alter
system modes of operation for conducting
performance management activities
TMF definition:
This process manages
the SLAs and reports
service performance to
the customer.
The TMF defines performance and SLA management
in the context of assurance.
The assurance process is responsible for the
execution of proactive and reactive maintenance
activities to ensure that services provided to
customers are continuously available and to SLA or
quality of service (QoS) performance levels.
It performs continuous resource status and
performance monitoring to detect possible failures
proactively, and it collects performance data and
analyzes it to identify potential problems to resolve
them without affecting the customer.
TMF definition:
Related documents are
• TMF 701, Performance Reporting Concepts &
Definitions;
• TMF GB917, SLA Management Handbook,
which also refers to ITU M.3010; and
• the FAB model of the eTOM.
Figure 1-4. Performance Management Architecture
Figure 1-5. Network Management Building
Blocks
Purposes of Performance
Various performance
scenarios:
Baselining
Service Monitoring
Network Performance
Monitoring
Device Performance
Monitoring
Device Performance Monitoring


Network Element Performance Monitoring
From a device perspective, we are mainly interested in device "health" data, such as overall
throughput, per-(sub)interface utilization, response time, CPU load, memory consumption, errors,
and so forth
System and Server Performance Monitoring
Low-level service monitoring components:
 - System: hardware and operating system (OS)
 - Network card(s)
 - CPU: overall and per system process
 - Hard drive disks, disk clusters
 - Fan(s)
 - Power supply
 - Temperature
 - OS processes: check if running; restart if necessary
 - System uptime
High-level service monitoring components:
 - Application processes: check if running; restart if necessary
 - Server response time per application
 - Optional: Quality of service per application: monitor resources (memory, CPU, network
bandwidth) per CoS definition
 - Uptime per application
Figure 1-23. Catalyst 6500 NAM ART
Measurement


A practical
approach is to
measure the
server
performance
with the Cisco
IP SLA or Cisco
NAM card for
the Catalyst
switch.
The NAM
leverages the
ART MIB and
provides a
useful set of
performance
statistics if
located in the
switch that
connects to the
server farm
Network Performance Monitoring:
Transmission efficiency
Jitter (delay variation)
Network delay
Network
throughput/capacity
Packet loss
Utilization (device,
network)
Network response time
Service Monitoring
From a service perspective, here are significant
parameters to monitor:
Key Quality
Indicators (KQI)
Jitter (delay variation)
Mean Opinion Score
(MOS) in the case of
voice
Key Performance
Indicators (KPI)
Packet loss
Service delay
Service availability
Service meaning
Service
— A generic definition by Merriam-Webster declares: "A facility supplying some public
demand...." More specifically, related to IT, we define a service as a function providing
network connectivity or network functionality, such as the Network File System,
Network Information Service (NIS), Domain Name Server (DNS), DHCP, FTP, news,
finger, NTP, and so on.
Service
level
— The definition of a certain level of quality (related to specific metrics) in
the network with the objective of making the network more predictable
and reliable.
Service level
agreement
(SLA)
—A contract between the service provider and the customer that describes
the guaranteed performance level of the network or service. Another way of
expressing it is "An SLA is the formalization of the quality of the service in a
contract between the Customer and the Service Provider.“
Service level
management
— The continuously running cycle of measuring traffic metrics,
comparing those metrics to stated goals (such as for
performance), and ensuring that the service level meets or
exceeds the agreed-upon service levels
Table 1-8 provides some generic SLA
examples.
Table 1-8. Generic SLAs
Class
Premium
SLAs
Application
Availability: 99.98/99.998 percent Broadcast
videoTraditional voice
Latency: 50 ms maximum
Packet delivery: 100 percent
Jitter: 2 ms maximum
Optimized
Availability: 99.98/99.998 percent Compressed video
Best effort
Latency: 50 ms maximum
Packet delivery: 100 percent
Jitter: 10 ms maximum
Voice over IP
Mixed application
Virtual private network
Availability: 99.98 percent
Internet data
Latency: 50 ms maximum
Packet delivery: 99.95 percent
Baselining
Baselining is the process of studying the
network, collecting relevant information,
storing it, and making the results available for
later analysis.
A general baseline includes all areas of the
network, such as a connectivity diagram,
inventory details, device configurations,
software versions, device utilization, link
bandwidth, and so on.
Baselining tasks include the
following:
Gather
device inventory information (physical as well as logical). This can be
collected via SNMP or directly from the command-line interface (CLI)—for
example, show version, show module, show run, show config all, and
others.
Gather
statistics (device-, network-, and service-related) at regular
intervals.
Document
the physical and logical network, and create network maps.
Identify
the protocols on your network, including
Baselining tasks include the
following:
Identify
the protocols on your network, including
- Ethernet, Token Ring, ATM
- Routing (RIP, OSPF, EIGRP, BGP, and so on)
- Legacy voice encapsulated in IP (VoIP)
- IP telephony
- QoS (RSVP)
- Multicast
- MPLS/VPN
- Frame Relay
- DLSW
Baselining tasks include the
following:
Identify
the applications on your network, including
- Web servers
- Mainframe-based applications (IBM SNA)
- Peer-to-peer applications (Kazaa, Morpheus,
Grokster, Gnutella, Skype and so on)
- Backup programs
- Instant messaging
Monitor
& study
• statistics over time,
• traffic flows.
From a performance baselining perspective, we
are primarily interested in performance-related
subtasks
Collect
network device-specific details:
- CPU utilization
- Memory details (free system memory,
amount of flash memory, RAM, etc.)
- Link utilization (ingress and egress traffic)
- Traffic per class of service
- Dropped packets
- Erroneous packets
From a performance baselining perspective, we
are primarily interested in performance-related
subtasks
Gather
server- and (optionally) client-related details:
- CPU utilization
- Memory (main memory, virtual memory)
- Disk space
- Operation system process status
- Service and application process status
Gather
Service related information :
- Round-trip time
- Packet loss (delay variation—jitter)
- MOS (if applicable)
both performance monitoring and accounting management
gather usage data used as input for various management
applications.
Performance management is one example of
a management area that benefits from
performance monitoring and accounting, but
also actively modifies the network and its
behavior.
without performance monitoring
you operate the network
blindfolded.
Without accounting, you can hardly
identify the cause of bottlenecks
and outages identified by
performance management.
Figure 1-6. Complementary Solution
The intersection
between the two areas is
typically the network
monitoring part.
This is a generic term for any data collection
tasks that are common between accounting
management and performance management.