Router Anatomy - Institute for Systems Research

Download Report

Transcript Router Anatomy - Institute for Systems Research

Anatomy of an IP Router
Vahid Tabatabaee
Fall 2007
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
1
References
 Title: Network Processors Architectures, Protocols, and Platforms
Author: Panos C. Lekkas
Publisher: McGraw-Hill
 James Aweya, “IP Router Architectures: An Overview”, Nortel
Networks, Ottawa, Canada
 Florian Brodersen, Alexander Klinetschek, “Anatomy of a High
Performance IP router”, Communication Network Seminar 2003/04,
Hasso-Plattner-Institute, University of Potsdam, Jan. 2004
 Steve Kohalmi, Tim Hale, “Anatomy of an IP Service Edge Switch”,
2002 Quary Technologies.
 Cisco Systems CRS-1 router documents.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
2
Basic IP Router Components
 Network Interfaces
Path computation,
Routing Table Maintenance
 Processing Modules
 Buffering Modules
 Interconnection Unit (switch
fabric)
 The processing and buffering
modules may be replicated
either fully or partially on the
network interfaces.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
Transfer Packets btw.
Ingress and Egress
Interface (Line) Cards
Packet Forwarding,
Packet Processing,
May cache routing table
3
Basic Functions of a Router
 Route Processing (Routing
Protocols OSPF, RIP, …)
Slow Path or
Control Plane
Path Computation
Routing Table
Maintenance
Reachability Propagation
 Packet Forwarding
Fast Path or
Data Plane
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
4
Packet Forwarding
 IP Packet Validation
 Version Number
 Header length field
 Check sum.
 Dest. IP address parsing and table lookup.
 Local delivery in the network.
 Unicast delivery to an output port.
 Multicast delivery to a set of output ports
 Packet Lifetime Control
 Adjust the time-to-live (TTL) field
 A packet with positive TTL is delivered to a local address
 Packet delivered to output ports has its TTL decremented and rechecked before
forwarding
 Packet Fragmentation
 Check if the packet size is larger than MTU of the network
 If yes, fragment the packet.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
5
First Generation of Routers
Similar to a typical
computer layout.
All functionality is
implemented in
software.
Single CPU, single
Memory, Single
Bus!
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
6
Problems with first generation routers
 Processing speed is limited by the
single CPU.
 The CPU should process all packets
destined to it and those packets that
are passing through it.
 Major packet processing tasks such
as table lookups are memory intensive
operations and can not be done faster
by simple processor upgrades.
 Software implementation is inefficient,
since it is a small set of operations
repeated on all packets.
 Slow path and fast path are
implemented on the same CPU.
Therefore, slow path can influence the
fast path.
 The routing table size has grown from
20,000 entries from 1994 to 260,000
entries today.
 Moving data from one interface to
another can be time consuming that
often exceeds the packet processing
time.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
Source http://bgp.potaroo.net/
7
Problems with first generation routers
 The routing table lookup speed can not be improved if we use
traditional memories.
 The conventional bus structure for the interconnection is very
inefficient.
 Every packet has to pass the bus at least twice.
 The whole packet (not just the header) is transferred.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
8
How fast a router should be?
 An OC-48 link data rate : 2.488 Gbps
 Packet rate is more important than the data rate.
 Bottleneck is caused by the minimum packet size which depends
on the technology.
 E.g. Packet-over-SONET (PoS): 40 byte IP payload + 6 byte
PPP/HDLC overhead:
2.405 Gbps /(8 x 46) = 6.53 MPPS
 The aggregate packet rate for a 16 port system:
16 x 6.53 = 104.48 MPPS
 One decision every 9.57 nsec.
 SDRAM speed is about 10ns from sequential locations and
practically around 20-50 ns.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
9
What is the solution
 Take advantage of Parallelism:
 NIC became more intelligent and took care of most packet
forwarding.
 We use ASIC in NIC (line cards) for packet classification and
forwarding.
 Most packets do not go to the CPU card (control card).
 Switching Interface:
 Use switching element to pass packets between line cards
directly and simultaneously.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
10
Modern Switch Based Architecture
 Classification and forwarding
decisions are done in line
cards.
 High speed interconnection
mechanism (switching)
between the line cards.
 This provides a fast data path.
 Standard CPU (RISC
processor) is used for the
control plane (slow path).
 Hardware and/or software
implementation for
classification and forwarding
in the line card.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
11
Functional Blocks in a Modern Switches
 The PHY Interface
 Responsible for transmitting and receiving information
 Conversion of the bit stream from digital form to analog signal and vice
versa.
 Switch Fabric
 The router has a bus or a backplane
 The switch fabric reads packet from input port and routes it to the
output port.
 Packet processing
 Fast path (data path): Handles all operations that are executed in real
time on packets (e.g.: framing/parsing, classification, modification,
compression/encryption, queueing)
 Slow path (control path): Operations executed of the packet flows.
(e.g.: add. Resolution, route calculation, update of routing table,…)
 Host processing
 Network management, configuring devices, diagnostics
 Implemented in software on a CPU
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
12
Line cards in Modern Switches
 Line card handles packet processing such as:
Classification
Forwarding
Traffic Policing and shaping
Monitoring and Statistics
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
13
Data Path Diagram
Source: Light Reading Report
Switching
Element
Ingress
Optics
Egress
CDR &
Serdes
Framer
/
Mapper
Network
Processor
Traffic
Manager
Switch
Interface
Scheduling
Element
Line Card
Switch Card
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
Packet Processing Units
14
Data Path Functions
Network Processor
Ingress Traffic Manager
- Parse
- Identify flow
- Determine Egress Port
- Mark QoS Parameters
- Append TM or SF
Header
- Police
- Manage congestion (WRED)
- Queue packets in classbased VOQs
- Segment packets into
switch cells
Switch Fabric
- Queues cells in class based
VOQs
- Flow control TM per class
based VOQ
- Schedule class based VOQs
to egress ports
Ingress Line Card
Switch Fabric
-Reassemble cells into
packets
-Shape outgoing traffic
-Schedule egress traffic
Egress Line Card
TM
Scheduler
Incoming
packets
SF
Arbiter
Reassemble
Class based queueing of outgoing
packets
Segmentation + header
WRED
Egress Traffic Manager
Egress
Scheduler&
Shaper
Discard
SF
Flow Control
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
15
Switch Card to Line Card Connection
 This connection should pass through the Backplane.
 Serdes (Serializer-Deserializer) is used for this
connection.
Each Serdes signal run over two wires and two pins
(differential mode signal).
The speed is usually around 3.125 Gbps.
They run some sort of coding (8b/10b encoding)
The actual data rate would be around 2.5 Gbps.
There are attempts to provide 10 Gbps serdes.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
16
How many serdes do we need?
 How fast should be the connection between switch card and line card?
 The line speed is not enough.
 Switch fabric throughput is less than 100% due to contention.
 Network Processor, Traffic manager and switch fabric add their
headers.
 There is also cell tax.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
17
RL
Line
Card
Elements
Line Card
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
RTM
Switch
Interface
Switch Fabric
Header
Fragmentation
(Cell Tax)
 Effective Speedup = RSF/RTM
 In the commercial systems,
speedup usually refers to
RSF/RL.
 Higher speedup factor:
 Increases system design
complexity.
 Increases power
consumption.
 Creates signal integrity
issues.
 Required Speedup factor is
around 2
Traffic Manager
Header
Speedup
RSF
Switching
Element
Scheduling
Element
Switch Card
18
Redundancy
 We have spare switch cards and control
cards in the system.
 The redundancy models:
 Passive redundancy (N:1) We have
one inactive switch card in the
system that starts to work after
failure.
 Passive redundancy (1:1, N:N) for
each active switch card, we have
one inactive card.
 Load-Sharing Redundancy (N-1) all
cards are active and when a failure
happens, performance will degrade
gracefully.
 Active Redundancy (1+1): Two sets
of fabrics carrying the same traffic.
Source: www.idt.com/content/switchblock.jpg
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
19
Example
 In a 16 port 10Gbps switch with 2X speed up with and N:N
redundancy how many 2.5 Gbps serdes do we need?
 We need 20 Gbps active and 20 Gbps redundant data rate for
each line card.
 This means 16 serdes for each line card.
 For 16 line cards we need 256 serdes in this system.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
20
Example
 What is the effective speed up of this system for 40 byte IP packets if
the traffic manager header size is 12 bytes, switch fabric header size is
8 bytes and the payload size of the cell is 52 bytes.
 Solution: In slide 9 example we show that there can be 6.53 MPPS (40
byte packets) on an OC-48 line.
 Similarly on an OC-192 there can be up to 9.622/(8x46) = 26.15 MPPS.
 Each packet is encapsulated in one cell, since
40 + 6 < 52
 The maximum number of cells that a line card can generate is
(2.5 x 8 Gbps) / ((52+8+12)x8) = 34.722
 Effective Speedup is,
Speedup = (34.722/26.15) = 1.33
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
21
Traces Per Serdes
 Typical LVDS speed is 1.25Gbps
 For 2.5Gbps we need 2 channels
 LVDS is differential, i.e. 2 traces per channel
 LVDS is unidirectional, i.e. 2 for full duplex
 Full duplex 2.5Gbps, using LVDS requires 8
traces
 In the previous example we will have 256 x 8 =
2048 traces on the back plane.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
22
A sample Router (Cisco CRS-1)
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
23
The line card chassis
8 service cards and 8 physical layer interface module cards
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
24
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
25
16 slot Single-Shelf system
Physical layer
Interface
Module
Switching
Card
Routing
Processor
(control plane)
The distributed route processor (DRP) is optional components that provide enhanced routing capabilities.
• The DRP contains two symmetric multiprocessors (SMPs), each of which performs routing functions.
• Processor-intensive tasks (such as BGP speakers and ISIS) can be offloaded from the route processors (RPs) to the DRPs.
ENTS689L: Packet Processing and Switching
26
Anatomy of an IP Router
Multishelf Systems
2 to 72 line card shelves
1 to 8 fabric card shelves
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
27
How to handle packet processing?
 Of the shelf CPU
This usually would be a RISC processor.
In low end systems it could be a CISC processor.
 ASIC
Specialized high performance ASIC to handle packet
processing.
Ideal approach for companies such as IBM and intel,
since they are manufacturers of Integrated Circuits
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
28
Off-the-shelf CPU Systems
 Packet processing is implemented in software running on the CPU.
 Modifications, upgrades and debugging is accomplished by simple
software updates and downloads
 Update time much shorter which is good for both user and
developer
 Not very efficient: spending many clock cycles on tasks not related
to packet processing.
 Fastest off-the-shelf CPU can handle about 1 gigabit per second.
 Trend is to do deeper packet processing (more on this later).
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
29
Memory Bottleneck
 The pipeline architecture of CPU enables them to perform billions
of instructions per second.
 However, in order to sustain the pipeline they should fetch data
from memory and store it back continuously.
 This can be done with very sophisticated multi-level hierarchy of
different memory technology, interleaving memory banks.
 This requires prohibitive cost, design complexity and power
consumption.
 Hence typical processor pipeline end-up being often empty, which
reduces the system throughput.
 Network traffic statistics models are completely different from local
traffic on a computer bus. They do not have the same spatial and
temporal locality properties. Hence, the typical processor’s cache
systems are not effective.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
30
Sup-optimal Instruction Set
 The instruction set that we need for packet processing requires
specific bit level operations.
 These instructions should be done at wire speed.
 These instructions are not available as standard instructions of offthe-shelf CPU.
 Hence, we have to assemble multiple standard instructions to
perform the intended functionality.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
31
Packet processing with ASIC
 ASIC typically delivers higher performance.
 ASIC is not programmable:
 Adding new functionality  new design
 Adding new protocol  new design
 New design  Costly for both vendor and the user.
 ASIC design is very time consuming
 Design cycle takes 12 to 18 months.
 If we need some modification we may need to recode the whole
design.
 Many start-up failures are due to time delay.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
32
ASIC development is costly
 Expensive and time-consuming to change.
 For testing an ASIC you need to design a
system
 Expensive development tools (design and
verification).
 Requires ASIC designers (much more
expensive).
 Tape out of a chip costs around a million dollar.
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
33
So is there a middle ground solution?
 Can we have a technology that :
 Has flexibility of programmable processors
 Has high speed of ASICs
Solution is called
Network Processor!
 Network processor are programmable similar to CPU, but their
performance is close to ASIC
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
34
Network Processors value proposition
 Shorter time to market:
 Instead of 18 months it takes about 6 months to complete
development cycle of packet processing part.
 Longer time in market:
 New features can be embedded into a deployed network
processor based product.
 Increased time in market reduces cost of product ownership
over the life of product.
 Just-in-time delivery of new features:
 We can modify the design and adding new features in the field
without penalizing the customer.
 Greater focus on other issues of business management
 Most functions are already coded in a standard way
 Developers can focus on differentiating features
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
35
Packet Processing Stages
1.
Remove Link Layer Headers and Decryption:

Ethernet

PPP (Point-to-Point Protocol) Frame

PPP over ATM

PPP over Ethernet over ATM
2.
Identify Ingress Subscriber:

To extract information from the link layer protocol header about the owner of the
packet.
3.
Filtering:

To permit of deny specific traffic flows, based on various attributes of the IP and
higher layers headers.
4.
Traffic Classification:

To allow different traffic management, QoS, security and routing policies applied to
different types of flows.
5.
Traffic Metering, Marking & Policing:

To control Peak and Committed Information Rate.

To determine PHB in the DiffServ Model (chnaging the priority)
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
36
Packet Processing Stages
6.
Custom Routing Polices:

To direct some traffic through specific paths (internet, VPN, specific destination)

Virtual Private Routed Network allows users to network in privacy over their own
routed network using their own private address.

Sending suspicious traffic to explicit locations for special processing.
7.
NAT & NAPT (Network Address [Port] Translation):

Address translation at the source if the user is using a private address space.

Static one-to-one with NAT and dynamic many-to-one with NAPT.
8.
Route Table Look-up:

Best matching prefix look-up on the destination IP address.
9.
Enforcing the PHB/ PerFlow (Link Sharing):

Priority, WRR, WFQ scheduling, WRED (weighted random early detection).
10. Egress Side Processing (QoS, filtering, encryption, NAT, Egress Subscriber Identification,
Traffic Classification, Link Sharing)
11. Statistical Collection
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
37
Deep Packet Processing
 In deep packet processing we need to look at the contents of the
packet not just the header.
 Why do we need deep packet processing?
 Deep packet inspection for firewalls and intrusion detection
systems.
 Traffic shape or discard P2P traffic
 Server load balancing: distribution of traffic among servers
based on the web destination
 Network Monitoring and Analysis
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
38
Packet Processing Implementation issues
 We need multiple table look-ups for each packet.
 Access to whole packet not just the IP header is necessary.
 There can be ten of thousands of simultaneously active
subscribers comprising millions of application flows.
 In a fully loaded Gigabit Ethernet connection about 1.5 million
packets per second must be processed
 Modern general processors are optimized for numeric computation
rather than processing packets.
 Memory read and write speeds become bottlenecks.
 Caching and high-speed memory burst capability does not help,
since packet processing requires:
 Large tables
 Short entries
 Random access queries
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
39
How do network processors do this?
 Specialized circuitry and micro-engines to perform all generic
packet processing functions.
 They also usually embed a major programmable module, usually a
tailor-made RISC CPU (and sometimes more than one).
 Real time operating system
 Handshake communication with other parts of the system
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
40
Network Processor Categories
 Platform Network Processors objectives:
 Handle most packet processing functions
 Minimize the number of components and the hardware cost
 Optimize the trade-off btw. Performance and flexibility
 Accelerate software development cycle
 Peripheral Network Processors
 Designed to optimize a specific function
 Compressor chips
 IP security
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
41
The other side argument
 Every single task can be done in wire-speed.
 How about multi-tasks at the same time.
 What is a realistic scenario to consider?
 Challenge of Benchmarking
 Programming complexity
ENTS689L: Packet Processing and Switching
Anatomy of an IP Router
42