Practical use of Ethernet OAM Joerg Ammon, Brocade
Download
Report
Transcript Practical use of Ethernet OAM Joerg Ammon, Brocade
Practical use of Ethernet
OAM
Joerg Ammon ([email protected])
Systems Engineer Service Provider
May 2011
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
1
Overview
• A variety of Operations, Administration, and
Management (OAM) protocols and tools were
developed in recent years for MPLS, IP, and Ethernet
networks.
• These tools provide unparalleled power for an operator
to proactively manage networks and customer Service
Level Agreements (SLAs).
• This session reviews the various OAM tools available in
MPLS/IP/ Ethernet networks at various layers of the
stack and recommends/reviews best practices for
choosing the right OAM protocol to use in a network.
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
2
OAM Tools
Scope of this presentation
Management Plane
(NMS,EMS)
OAM&P
Network Plane
(Network Elements)
Scope of this presentation:
OAM tools across
network elements
Scope of this presentation is within network plane only
(not management plane)
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
3
OAM Layering
OAM Layers
Service Layer OAM
• OAM is layered…
Network Layer OAM
• Service Layer OAM
• Network Layer OAM
• Transport Layer OAM
• ... and hierarchical
• For example, service
layer for Operator A is
transport layer for the
service provider
Transport Layer OAM
Service Provider
Customer
Network
MPLS
Ethernet
Operator A
Network
Operator B
Network
Customer
Network
Customer
Location 1
Customer
Location 2
Service OAM
• Each layer supports its own
OAM mechanisms
• Operator A has an MPLS network
and uses MPLS OAM tools
• Operator B has an Ethernet
network and uses Ethernet
OAM tools
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
MPLS OAM
(Operator A)
Link OAM
May 2011
Ethernet OAM
(Operator B)
Link OAM
Link OAM
4
OAM Tools
Each layer has its own best-suited OAM tools
VPN
VRF Ping and Traceroute
(Layer 3 VPN)
802.1ag CFM for VPLS/VLL
Y.1731 PM for VPLS/VLL
(Layer 2 VPN)
IP
Ping and Traceroute
BFD for OSPF and IS-IS
MPLS
LSP Ping and Traceroute
BFD for RSVP-TE LSPs
Layer 2
Layer 2
Trace
Port Loop
Detection
UDLD
Single-link
LACP
Keep-alive
802.1ag
CFM/
Y.1731 PM
802.3ah
EFM OAM
Business Problem
Brocade Solution
• Fault detection, verification, and isolation at every level
• Standards-based, end-to-end OAM
• Proactive detection of service degradation
• Comprehensive/scalable MPLS, IP, and
Ethernet OAM tools
• Performance Monitoring (PM) and SLA verification
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
5
Layer 2 OAM
+ Layer 2 VPN CFM/PM: 802.1ag CFM, Y.1731 PM
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
6
Layer 2 OAM
+ Layer 2 VPN CFM/PM: 802.1ag CFM, Y.1731 PM
VPN
VRF Ping and Traceroute
(Layer 3 VPN)
802.1ag CFM for VPLS/VLL
Y.1731 PM for VPLS/VLL
(Layer 2 VPN)
IP
Ping and Traceroute
BFD for OSPF and IS-IS
MPLS
LSP Ping and Traceroute
BFD for RSVP-TE LSPs
Layer 2
Layer 2
Trace
Port Loop
Detection
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
UDLD
May 2011
Single-link
LACP
Keep-alive
802.1ag
CFM/
Y.1731 PM
802.3ah
EFM OAM
7
IEEE 802.1ag CFM
Connectivity Fault Management (CFM)
Service Provider
• Facilitates
• Path discovery
• Fault detection
• Fault verification and
isolation
• Fault notification
• Fault recovery
• Supports
• Continuity Check
Messages (CCMs)
• LinkTrace
• Loopback messages
Customer
Network
Operator A
Network
Operator B
Network
Customer
location 1
Customer
Network
Customer
location 2
Customer CFM
Service Provider CFM
MEP
MIP
Operator A CFM
Operator B CFM
Brocade Implementation
•Support for minimum CCM
timers (3.3 ms) using hardware
offload
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
•3.3 ms, 10 ms, 100 ms, 1 s, 1 min,
10 min
May 2011
8
IEEE 802.1ag CFM
Terminology
• MD (Maintenance Domain)
• The part of a network for which
faults in Layer 2 connectivity can
be managed
Service Provider
Customer
Network
ME
MD level 5
(7, 6, or 5)
Service Provider MA
ME
Operator A MA
MEP
ME
MIP
MD level 3
(4 or 3)
Operator B MA
ME
MD level 1
(2, 1, or 0)
• ME (Maintenance Entity)
• A point-to-point relationship between two MEPs within a
single MA
• MD Level
• An integer from 0 to 7 in a field in a CFM PDU that is used,
along with the VLAN ID, to identify which MIPs/MEPs would
be interested in the contents of a CFM PDU
• MA (Maintenance Association)
• A set of MEPs established to verify
the integrity of a single service
instance (a VLAN or a VPLS)
Customer MA
UP
MEP
• Two types: up (inward*) MEP or
down (outward) MEP
Customer
Network
Customer
location 2
Down
MEP
• A Maintenance Point (MP) at the
edge of a domain that actively
sources CFM messages
• A maintenance point internal to a
domain that only responds when
triggered by certain CFM
messages
Operator B
Network
Customer
location 1
• MEP (Maintenance End Point)
• MIP (Maintenance Intermediate Point)
Operator A
Network
(*): “inward” in respect to the device
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
9
IEEE 802.1ag CFM
Connectivity Check, LinkTrace, and Loopback Messages
•
Continuity Check Message (CCM)
•
•
•
MEP
A periodic hello message multicast by an MEP within the
maintenance domain
Periodic CCM (multicast)
Periodic CCM
MEP
LinkTrace Message (LTM)
•
A multicast message used by a source MEP to trace the
path to other MEPs and MIPs in the same domain
•
All reachable MIPs and MEPs respond back with a Link Trace
Unicast Reply (LTR)
•
The originating MEP can then determine the MAC addresses of all
MIPs and MEPs belonging to the same Maintenance Domain
MEP
LTM (multicast)
LTR (Unicast)
MEP
MIP
LTR (Unicast)
Loopback Message (LBM)
•
Used to verify the connectivity between a MEP and a peer
MEP or MIP
•
A loopback message is initiated by a MEP with a destination MAC
address set to the desired destination MEP or MIP (Unicast)
•
The receiving MIP or MEP responds to the Loopback message
with a Loopback Reply (LBR) (Unicast)
•
A loopback message helps a MEP identify the precise location
of a fault along a given path
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
MEP
LBM (Unicast)
LBR
MEP
10
Hierarchical Fault Detection
Example: fault in Operator B network (an MPLS Network)
•
Customer detects fault using Continuity Check and locates fault using Link Trace
•
Provider A detects fault using Continuity Check and locates fault using Link Trace
•
Provider B detects fault using Continuity Check, but isolates fault using MPLS OAM (see MPLS OAM section)
•
A service provider (not shown) would detect this fault in a similar way using Continuity Check and Link Trace
from CPEs (Customer Premise Equipment)
1: Customer Continuity Check detects end-to-end fault
2: Customer Link Traces isolate fault past customer MIPs
3: Provider A’s Continuity Check detects end-to-end fault
MIPs and MEPs at
VPLS/VLL endpoints
4: Provider A Link Traces isolate fault inside Provider B’s network
5: Provider B’s Continuity Check detects service fault
MPLS
PE
(VPLS/VLL)
PE
P
MEP
MIP
Fault
Customer
Network
(Site 1)
Operator A
(Location A1)
Operator B
Fault
Localized
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
Operator A
(Location A2)
May 2011
Customer
Network
(Site 2)
11
IEEE 802.1ag Configuration Example
To verify end-to-end connectivity between CE1 and CE2
MPLS
7
1/1
CE1
Configure a down MEP on CE1
CE1(config)#cfm-enable
CE1(config-cfm)#domain-name CUST_1 level 7
CE1(config-cfm-md-CUST_1)#ma-name ma_5
vlan-id 30 priority 3
CE1(config-cfm-md-CUST_1-ma-ma_5)#ccminterval 10-second
CE1(config-cfm-md-CUST_1-ma-ma_5)#mep 1
down vlan 30 port ethe 1/1
CE1(config-cfm-md-CUST_1-mama_5)#remote-mep 2 to 2
VLL
7
1/1
7
2/1
PE1
Create a VLL instance (PE1)
7
2/1
PE2
CE2
Create a VLL instance (PE2)
PE1(config)#router mpls
PE1(config-mpls)vll pe1-to-pe2 30
PE1(config-mpls-vll)vll-peer 1.1.1.2
PE1(config-mpls-vll)untagged ethe 1/1
PE1(config-mpls-vll)vlan 30
PE1(config-mpls-vll-vlan)tagged ethe 1/1
PE2(config)#router mpls
PE2(config-mpls)vll pe2-to-pe1 30
PE2(config-mpls-vll)vpls-peer 1.1.1.1
PE2(config-mpls-vll)untagged ethe 2/1
PE2(config-mpls-vll)vlan 30
PE2(config-mpls-vll-vlan)tagged ethe 2/1
Configure CFM on PE1
Configure CFM on PE2
PE1(config)#cfm-enable
PE1(config-cfm)#domain-name CUST_1 level 7
PE1(config-cfm-md-CUST_1)#ma-name ma_5
vll-id 30 priority 3
PE1(config-cfm-md-CUST_1-ma-ma_5)#ccminterval 10-second
In the above configuration, a MIP is created by
default on the VLL port.
PE2(config)#cfm-enable
PE2(config-cfm)#domain-name CUST_1 level 7
PE2(config-cfm-md-CUST_1)#ma-name ma_5
vll-id 30 priority 3
PE2(config-cfm-md-CUST_1-ma-ma_5)#ccminterval 10-second
In the above configuration, a MIP is created by
default on the VLL-endpoint.
Configure a down MEP on CE2
CE2(config)#cfm-enable
CE2(config-cfm)#domain-name CUST_1 level 7
CE2(config-cfm-md-CUST_1)#ma-name ma_5
vlan-id 30 priority 3
CE2(config-cfm-md-CUST_1-ma-ma_5)#ccminterval 10-second
CE1(config-cfm-md-CUST_1-ma-ma_5)#mep 2
down vlan 30 port ethe 2/1
CE1(config-cfm-md-CUST_1-mama_5)#remote-mep 1 to 1
LSP ping and LSP traceroute tools would be used inside the MPLS network to detect and diagnose LSP failures
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
12
ITU-T Y.1731 Performance Management
• Standards-based performance
management for Ethernet
networks
• Interoperates in a multivendor
environment
• Supports high-precision,
on-demand measurement of
round-trip SLA parameters
• Frame Delay (FD)
• Frame Delay Variation (FDV)
• Measurements done between
MEPs
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
Brocade MLX
Brocade MLX
MEP
MEP
ETH-DM
Frame Delay
Frame Delay Variation
MEP: Management Enforcement Point
ETH-DM: Ethernet Delay Measurement
Benefits
• SLA monitoring and verification
Applicability
• Aggregation, metro, and core networks
• Delay-sensitive applications, such as voice
• Differentiated services with SLA guarantees
Brocade differentiation
• Hardware-based time-stamping mechanism
• Measurements with microsecond granularity
• Y.1731 PM for VPLS/VLL
May 2011
13
ITU-T Y.1731 Performance Management
Example
NetIron# cfm delay_measurement domain md2 ma ma2 src-mep 3 target-mep 2
Y1731: Sending 10 delay_measurement to 0012.f2f7.3931, timeout 1000 msec
Type Control-c to abort
Reply from 0012.f2f7.3931: time= 32.131 us
Reply from 0012.f2f7.3931: time= 31.637 us
Brocade MLX
Reply from 0012.f2f7.3931: time= 32.566 us
Brocade MLX
Reply from 0012.f2f7.3931: time= 34.052 us
MEP 2
Reply from 0012.f2f7.3931: time= 33.376 us
MEP 3
Reply from 0012.f2f7.3931: time= 31.501 us
ETH-DM
Reply from 0012.f2f7.3931: time= 33.016 us
Reply from 0012.f2f7.3931: time= 32.537 us
Reply from 0012.f2f7.3931: time= 32.492 us
Reply from 0012.f2f7.3931: time= 32.552 us
sent = 10 number = 10 A total of 10 delay measurement replies received.
Success rate is 100 percent (10/10)
====================================================================
Round Trip Frame Delay Time :
min = 31.501 us avg = 32.586 us max = 34.052 us
Round Trip Frame Delay Variation : min =
45 ns avg =
839 ns max = 1.875 us
====================================================================
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
14
Link OAM
IEEE 802.3ah Ethernet First Mile (EFM) OAM
• Supports point-to-point
(single) link OAM
• Monitors and supports
troubleshooting individual links
• Standards-based for Ethernet
networks
• Interoperates in a multivendor
environment
• Supports
• Fault detection and notification
(alarms)
• Discovery
• Remote failure indication
• Loopback testing
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
802.3ah
OAM
802.3ah
OAM
NetIron#show link-oam info detail ethernet 1/1
OAM information for Ethernet port: 1/1
link-oam mode: active
link status: up
oam status: up
Local information
multiplexer action: forward
parse action: forward
stable: satisfied
state: up
loopback state: disabled
dying-gasp: false
critical-event: false
link-fault: false
Remote information
multiplexer action: forward
parse action: forward
stable: satisfied
loopback support: disabled
dying-gasp: false
critical-event: false
link-fault: false
15
Layer 2 OAM
Summary
Intended
Application
Supports
Layer 2
Trace
Port Loop
Detection
UDLD
Layer 2 network
troubleshooting,
detection of
mis-configuration
Layer 2 network
troubleshooting,
detection of
mis-configuration
Link
keep-alive
Layer 2 topology
discovery,
Layer 2 loop
detection
Layer 2 loop
detection
Single-Link
Keep-Alive
Link
keep-alive
802.1ag
CFM
Y.1731 PM
Single-link
keep-alive
Service
verification
Performance
(SLA)
verification
Customer
access
verification
Single-link
keep-alive
Layer 2
Connectivity
Check,
Link Trace,
Loopback
One-way
delay and
delay
variation
Single-link
OAM: Fault
Detection,
Discovery,
Loop-back,
and so on
Manual
Auto,
Manual (LB)
Yes
Yes
Generation
Manual
Automatic
Automatic
Automatic
CC: auto
LT, LB:
manual
Standard
No
No
No
Yes
Yes
802.3ah
EFM OAM
Remember: OAM is layered and hierarchical
(service OAM for an operator is transport OAM for a service provider)
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
16
MPLS OAM
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
17
MPLS OAM
VPN
VRF Ping and Traceroute
(Layer 3 VPN)
IP
Ping and Traceroute
MPLS
LSP Ping and Traceroute
Layer 2
Layer 2
Trace
Port Loop
Detection
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
802.1ag CFM for VPLS/VLL
Y.1731 PM for VPLS/VLL
(Layer 2 VPN)
BFD for OSPF and IS-IS
BFD for RSVP-TE LSPs
UDLD
May 2011
Single-link
LACP
Keep-alive
802.1ag
CFM/
Y.1731 PM
802.3ah
EFM OAM
18
LSP Ping and LSP Traceroute
MPLS OAM tools
• LSP Ping and LSP Traceroute provide OAM functionality
for MPLS networks based on RFC 4379.
• LSP Ping and LSP Traceroute tools provide a
mechanism to detect MPLS data plane failure.
• MPLS echo requests follow the same data path that normal
MPLS packets would traverse.
• LSP Ping is used to detect data plane failure and to
check the consistency between the data plane and the
control plane.
• LSP Traceroute is used to isolate the data plane failure
to a particular router and to provide LSP path tracing.
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
19
LSP Ping
MPLS Network
• The basic idea is to verify that packets that belong to a particular
Forwarding Equivalence Class (FEC) actually end their MPLS path on
a Label Switching Router (LSR) that is an egress for that FEC.
PE
P
PE
(LER)
(LSR)
(LER)
Echo Request
Echo Reply
• LDP LSP Ping and RSVP LSP Ping are supported.
LSP Ping
LDP LSP Ping
NetIron# ping mpls ldp 22.22.22.22
Send 5 80-byte MPLS Echo Requests for LDP FEC 22.22.22.22/32, timeout 5000 msec
Type Control-c to abort
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max=0/1/1 ms.
Syntax: ping mpls ldp <ip-address | ip-address/mask-length> ... options
RSVP LSP Ping
NetIron# ping mpls rsvp lsp toxmr2frr-18
Send 5 92-byte MPLS Echo Requests over RSVP LSP toxmr2frr-18, timeout 5000 msec
Type Control-c to abort
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max=0/1/5 ms.
Syntax: ping mpls rsvp lsp <lsp-name> | session <tunnel-source-address> <tunnel-destination-address>
<tunnel-id> ... options
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
20
LSP Traceroute
MPLS Network
• With LSP traceroute, an echo request packet is sent to the control plane
of each transit LSR, which confirms that it is a transit LSR for this path.
PE
P
PE
(LER)
(LSR)
(LER)
Echo Request
• Transit LSRs return echo replies.
• LDP LSP Ping and RSVP LSP Ping are supported.
Echo Replies
LSP Traceroute
LDP LSP Traceroute
NetIron# traceroute mpls ldp 22.22.22.22
Trace LDP LSP to 22.22.22.22/32, timeout 5000 msec, TTL 1 to 30
Type Control-c to abort
1 10ms 22.22.22.22 return code 3(Egress)
Syntax: traceroute mpls ldp < ip-address | ip-address/mask-length> ... options
RSVP LSP Traceroute
NetIron # traceroute mpls rsvp lsp toxmr2frr-18
Trace RSVP LSP toxmr2frr-18, timeout 5000 msec, TTL 1 to 30
Type Control-c to abort
1 1ms 22.22.22.22 return code 3(Egress)
Syntax: traceroute mpls rsvp lsp <lsp-name> | session <tunnel-source-address>
<tunneldestination-address> <tunnel-id>... options
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
21
MPLS OAM
Summary
LSP Ping
LSP Traceroute
BFD for RSVP-TE LSPs
To detect data plane failure
and to check the consistency
between the data plane and
the control plane
To isolate the data plane
failure to a particular
router and to provide LSP
path tracing
Supports
Connectivity verification
Fast data plane failure
Connectivity troubleshooting, detection (link may
fault localization
be up, but data path
is down)
Generation
Manual
Manual
Automatic
Standard
Yes
Yes
Yes
Intended
Application
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
May 2011
Fast data plane failure
detection for RSVP
LSPs
22
Observation
ICMP
Operates at
Ping
Layer 3
Specification
RFC792
Published
Sept 1981
RFC1208
(RFC 1983)
March 1991
(Aug 1996)
July 1983
CFM
Layer 2
802.1ag
Dec 2007
26 years of work
for going down one layer of OAM
© 2010 Brocade Communications Systems, Inc. Company Proprietary Information
September 2010
23
Thank You
© 2011 Brocade Communications Systems, Inc. Company Proprietary Information
24