Performance Targets

Download Report

Transcript Performance Targets

NAPUS
Performance & Availability Reporting
June, 2012
John Sherwood
What is NAPUS?
• Network Availability, Performance, and User
Support Working Group (NAPUS-WG)
– Formed under CANARIE Technical Committee at
CANHEIT 2011
• July 4, 2011 Inaugural NAPUS meeting
– Goal “To enable national consistency across Canada for
measuring network availability and performance..”
• Chairs: Andre Toonk (BCnet) and JF Amiot
(Cybera)
NAPUS Sub-Committee
• Sept 13, 2011 meeting set up sub-committee “... to
work on set of best practices and
recommendations for Availability and Performance
reporting”
• Two reports commissioned:
– “Network Availability and Performance Monitoring and
Reporting”
– “Reporting and tracking Multi-domain Lightpath service
issues”
• Reports were received, approved, and sent to
NAPUS, Tech Committee, and OAC in March/April
Network Availability and Performance
Monitoring and Reporting
Andree Toonk
Jean-Francois Amiot
Jun Jian
Gerry Miller
John Sherwood
Thomas Tam
Goal of the Report
• “... to provide definitions and guidelines for
measuring and reporting network operational
status in a standardized way”
• Attempt to report Availability or Performance in a
single number
– e.g. “99.97% availability during March”
What is Availability?
• A service, such as a network, is engineered to
certain design criteria.
• The service is “available” if it meets those design
criteria.
What is Performance?
• Wikipedia says: “Network performance refers to
the service quality of a telecommunications
product as seen by the customer.”
(http://en.wikipedia.org/wiki/Network_performance)
– i.e. Performance is in the eye of the beholder
• Abortive attempt to quantify:
   
P  R    
D L J
Availability vs. Performance
• Availability is quantifiable and measurable.
• Performance is much more subjective.
• Therefore, NAPUS decided to focus their effort on
Availability.
Step one: Availability of What?
• Define “Service”
– “...an entity with well defined endpoints, characteristic
parameters, and performance criteria”
• A Service could be a network, a web server, or
some other definable entity.
Step two: Endpoints
• For a web server, there is only one
• For a network there are two endpoints
– must be accurately defined
– typically unidirectional
Step three: Define Parameters
• Characteristic parameters define how well a
service behaves
• Some possibilities for networks:
–
–
–
–
–
BER (bit error rate), mostly useful for layer 1 links
latency
jitter
packet loss, measured at layer 2 or 3
bandwidth
Step four: Performance Targets
• Each parameter should have a performance target
• May have a secondary (“degraded”) target
• Service is considered “available” if it meets all of
its targets
• Availability is “unknown” if data is missing
Example
SERVICE TITLE
IP Transport, NBnet to CANARIE Halifax
Endpoint1
NBnet perfSONAR station
Endpoint2
CANARIE Halifax perfSONAR station
Latency performance target
≤10.0 msec
Latency performance target
(degraded service)
> 10.0 and ≤ 35.0 msec (to allow for failure of
the Fredericton-Halifax link, and rerouting of
traffic through Montreal)
IP Successful Delivery, long term
≥0.9995 in any 24 hour period
IP Successful Delivery, short term
≥0.998 in any 10 minute period
Availability Definitions
TERM
DEFINITION
Operational
A service is considered “operational” if it meets all of its
performance targets for a non-degraded service. This is the status at
a moment in time.
Degraded
A service is considered “degraded” if it meets all of its performance
targets, except that one or more of the targets it meets are defined
as a degraded target (e.g. longer than normal latency, but still
usable)
Unavailable
A service is considered “unavailable” if it fails to meet one or more
of its operational or degraded performance targets.
Availability
The fraction of time over a defined window during which a service
is considered to be “operational”. This is the status over time rather
than at a particular moment.
Sample “Core” Network
Meta Service
• Sample network is too
complex to define as a
service
• So, define “meta
service” as a set of
simpler services, e.g.
{S1, S2, ... Sn}
• Then, 5 minute
measures from each
service are aggregated
& time sorted
Meta Service States
META SERVICE STATUS
DEFINITION
Operational
All of the most recent results from all
services are “operational”
Degraded
Any of the most recent results from any
of the services are “degraded” and all
others are “operational”
Unavailable
One or more of the most recent results
from any of the services are
“unavailable”
Any of the most recent results from any
of the services are “unknown”
Unknown
Mtl->Wpg Latency
Mtl->Hfx Latency
Mtl->Hfx Latency June 7
Possible explanations
• Traffic burst
– but 10msec @ 2Gbps is 20Mbits, or more than 1500
normal ethernet packets!
• Measurement error
– perfSONAR station load
– clock error (is ntp hiccupping?)
– ...??
• Router queuing (packet has low priority)
• Packet re-routing
• Maybe it is real
Recommendations
• CANARIE maintain perfSONAR at each core
router, IX, etc
• Each ORAN measure IP Transport availability
• CANARIE and each ORAN report monthly on
network availability
• These reports be published
perfSONAR Workshop
• Cybera will host the free Internet2 perfSONAR
workshop on October 1 at Summit2012