Transcript Week 2

TDC 460
Advanced Ethernet Topics
1
Outline
•
•
•
•
•
•
•
802.1D - Spanning Tree Algorithm and
Protocol (STP)
802.1w - Rapid STP
802.1s – per VLAN STP
802.3x - Full Duplex Flow Control
802.3ad - Link Aggregation
802.1Q – VLAN and VLAN Trunking
802.1p – Quality of Service (QoS)
2
Spanning Tree Algorithm and Protocol
(STP)
• Specified in IEEE 802.1D
• A link management protocol that transforms a
loop topology (could be multiple loops) into a
loop-free topology.
• STP forces redundant paths into stand-by
paths, and provides a fault tolerant scheme.
• STP is transparent to end stations.
3
Redundant Topology
Server/host X
Router Y
Segment 1
Segment 2
– Redundant topology eliminates single points of failure
– Redundant topology causes (1) broadcast storms, (2) multiple
frame copies, and (3) MAC address table instability problems
4
Broadcast Storms
Server/host X
Router Y
Segment 1
Switch A
Broadcast
Switch B
Segment 2
Bridges continue to propagate broadcast traffic over
and over
5
Multiple Frame Copies
Server/host X
Unicast
Router Y
Segment 1
Unicast
Bridge A
Unicast
Bridge B
Segment 2
• Host X sends a unicast frame to Router Y
• Router Y MAC Address has not been learned by
either bridge yet
• Router Y will receive two copies of the same frame
6
MAC Database Instability
Router Y
Server/host X
Segment 1
Unicast
Unicast
Port 0
Port 0
Bridge B
Port 1
Port 1
Bridge A
Segment 2
•
•
•
•
•
Host X sends a unicast frame to Router Y
Router Y MAC Address has not been learned by either bridge yet
Bridge A and B learn Host X MAC address on port 0
Frame to Router Y is flooded
Bridge A and B incorrectly learn Host X MAC address on port 1
7
The Solution
Blocking certain ports to transform loop topology into tree
topology
WS1
Segment 1
B2
B1
blocked port
Segment 2
8
STP Algorithm
• A ID/priority is assigned to each bridge. The ID/priority can
assigned by the network administrator. If two switches have
the same ID/priority, the MAC address is used to distinguish
them. (Lower number means higher priority.)
• Each port is assigned a cost. It is usually the bit rate (i.e.,
speed) of the port.
• A root bridge is selected. It is the bridge with the smallest
priority number.
• Each bridge (except root) determines its root port, which is
the port with the least cost path to the root bridge (RP). (Two
paths tie? Then use port with lowest ID.)
• Each LAN segment determines its designated port, which is
the port with the least cost path to the root bridge (DP).
• Remaining ports are put in the blocking state.
9
Spanning-Tree Protocol
Port/Path Cost
Link Speed
Cost (reratify IEEE spec))
------------------------------------------------------------------10 Gbps
2
1 Gbps
4
100 Mbps
19
10 Mbps
100
Ref: IEEE 802.1D p. 109
10
Example (before STP)
Segment 3
Segment 2
Segment 1
2
BR2 P=200
4
4
BR1 P=100
root
2
4
Segment 4
BR5 P=500
2
4
4
BR4 P=400
2
2
BR3 P=300
4
Segment 5
11
Example (after STP)
Segment 3
DP
2
BR2 P=200
DP
Segment 2
DP
DP
4
4
Segment 1
4
BR1 P=100
root
RP
2
Segment 4
4
RP
BR5 P=500
2
4
RP
BR4 P=400
2
RP
2
BR3 P=300
DP
4
Segment 5
12
Bridge Protocol Data Unit (BPDU)
• All bridges regularly exchange information via
a special frame called BPDU.
• Three types of BPDU packets:
– Configuration (spanning tree computation)
– Topology Change Notification
– Topology Change Notification Ack
• BPDUs are exchanged every 2 secs by default
13
Bridge Protocol Data Unit (BPDU)
• BPDU contains:
– The bridge ID that the transmitting bridge believes
to be the root.
– The path cost to the root from the transmitting
port.
– The ID of transmitting port.
14
STP Control Address as the Destination
802.3 Header
802.2 Header
802.1D BPDU
15
802.1D Protocol Stack
Protocol information of
STP. What are the STP
timers?
STP (802.1D)
Logical Link Control (802.2)
802.3
Physical Layer
What is LLC?
What are the DA and SA
of BPDU?
LLC: it is designed as an interface between MAC and upper layer protocol .
However, it is not used for IP packets, and it is used for layer-2, control and
management frames.
16
Port States
Blocking state: no user data
sent or received, but BPDUs
sent and received.
Listening state: switch
processes BPDUs and awaits
info to return to blocking.
Learning state: doesn’t
forward user data, but does
observe NIC addresses.
Forwarding state: normal
operation.
Disabled state: not a part
of STP but can be set by
network admin.
17
Notes on STP Ports
• A port can be manually configured as an enabled
port or a disabled port. A disabled port does not
accept BPDU, but could still accept management
frame.
• An enabled port is configured by STP into the
forwarding state or the blocking state where the
listening and learning states are transient states.
• A port in the blocking state accepts and forwards
BPDU, but does not accept or forward data frames.
• All ports on the root switch are in the forwarding
state.
• All ports connected to end stations are in the
forwarding state.
18
STP Timers
• Aging timer - the number of seconds a MAC-address will be
kept in the forwarding database after having received a packet
from this MAC address.
• Forward delay timer - the time spent in each of the Listening
and Learning states before the Forwarding state is entered.
• Hello timer – The time interval of a hello packet sent out by
the Root Bridge and the Designated Bridges. Hello packets are
used to communicate information about the topology of the
entire bridged LAN.
• Maximum message age timer - If the last seen (received)
hello packet is older than this timer, it is considered a topology
change (link failure). STP should be recalculated again.
19
STP Timer
Timer
Default Value
Range
Aging Time
300
10 – 1,000,000
Hello Time
2
1 – 10
Max [Message]
Age
Forward Delay
20
6 - 40
15
4 - 30
Times in seconds
20
How long is the failover time?
WS1
Link failure
B2
B1
blocked
If there is a link failure, how
long does it take to transform a
port from the blocked state to
the forward state? Too Long!
WS2
21
Fail-over Time Estimate
Max Age
Timer
Instantly
Forward Delay
Timer
Forward Delay
Timer
Enabled state
Max Age Timer:
time to detect a link
failure.
In the case of Loss
of Signal (LOS)
failure, the device
can detect the
failure immediately
without using the
Max Age timer.
22
STP Configuration/Demo
SW03
192.168.1.3
fa0/20
fa0/19
Linux-05
172.26.1.5
SW01
192.168.1.1
blocked
SW02
192.168.1.2
Linux-14
172.26.1.14
Q1: which switch is the root? Why?
Q2: if the link on fa0/20 is unplugged, what is the fail-over time?
Q3: if the link is plugged back, what is fall-back time?
Q4: what is the relationship of the fail-over time and fall-back time to the STP timers?
23
Problems with STP
• Long failover time: 45-60 seconds
• When there is a network failure, STP must
be recalculated for the whole network.
During the recalculation, all ports are in the
blocked state which is a total network
outage.
• General recommendation: do not use it.
STP problem is more often observed in an IP over
ATM network (RFC 1483/2684) where one could
accidentally create a virtual link to form a loop.
24
Possible Solutions to STP
• Proprietary implementation: Cisco Uplink
Fast
• Other proprietary implementation:
– Key concept: keep topology simple and use
local intelligence to changes a port from
blocking to forwarding without going through
the learning process.
• New standard: Rapid Spanning Tree
Algorithm and Protocol RSTP (802.1w)
25
RSTP Port States
• STP port states of Disabled, Blocking, Listening
have been replaced with Discarding state
• STP port states of Learning and Forwarding
remain the same
26
RSTP Port Roles
• Root – a forwarding port that is the best port
from non-root bridge to root bridge
• Designated – a forwarding port for every LAN
segment
• Alternate – an alternate path to the root
bridge
• Backup – a backup/redundant path to a
segment where another bridge port already
connects
• Disabled – not strictly part of STP
27
RSTP - BPDU
• With STP, a non-root switch would only generate
BPDUs when it received one on its root port. In fact,
a switch is simply relaying BPDUs rather than actually
generating them.
• This is not the case anymore with RSTP. A switch now
sends a BPDU with its current information every
<hello-time> seconds (2 by default), even if it does
not receive any from the root switch.
28
RSTP – Fast Failure Detection
• On a given port, if hellos are not received for three
consecutive times, protocol information can be
immediately aged out (or if max_age expires).
• BPDUs are now used as a keep-alive mechanism
between switches. A switch considers that it has lost
connectivity to its direct neighboring root or
designated switch if it misses three BPDUs in a row.
• If a switch fails to receive BPDUs from a neighbor, it
is certain that the connection to that neighbor has
been lost, as opposed to 802.1D where the problem
could have been anywhere on the path to the root.
• Failures are detected even much faster in case of
physical link failures.
29
RSTP Failover Time
When a link failure is
detected (3 
HelloTime), the port
role is changed
immediately. After
that, the port is put in
the forwarding state
immediately.
B3
Link failure
B2
blocked
B1
B4
If the failure is due to loss of signal (LOS), the detection time is << 1 sec.
30
Flow Control (CSMA/CD)
• If a receiver has more data than it can handle,
incoming frames will be lost.
• The flow control process is for a receiver to inform
the sender to slow down.
• In a CSMA/CD network, collision is the built-in
mechanism to slow down the process.
– If there are many stations on a shared media network
trying to send data, the network will see many collisions,
which prevents the network from overloading. This is
called saturation.
– If a station receives data faster than it can handle, the
station could create collisions (pretending to send) and the
sender will slow down. This is called back pressure.
31
Flow Control
Switched Half-duplex Network
1.
2.
3.
4.
Server transmits at 100M bps.
Client receives data at 10M bps.
Switch buffer overflow.
Switch generates artificial
collisions.
5. Server slows down.
100Mbps
10Mbps
32
Flow Control (Full Duplex)
• A full-duplex connection is basically a point-topoint configuration, switch-to-switch, switchto-station, and station-to-station.
• The link carries separate transmit and receive
channels. There is no contention for the use
of shared media, so there are no collisions.
• In addition to BER (bit error rate), the primary
cause of frame loss is buffer overflow at the
receiver end. So we need to do flow control.
33
IEEE 802.3x Flow Control
• A new frame, PAUSE, is specified in 802.3x to
slow down the transmitter temporarily.
– It is similar to XOFF function in dial-up modems
Payload
(data)
Destination Address: a special address, 01-08-C2-00-00-01. This
address is blocked by all switches, and does not forward. It is
recognized by stations and switches implementing the new MAC
control layer (802.3x) and ignored by others.
34
Flow Control (client)
1. The client does not have the
capability to handle the data
received from the switch. The
cause is usually at the upper
layer, instead of the MAC
layer.
2. The MAC layer sends the
PAUSE frame to the switch.
3. The switch stops sending
frame to the client. Note that
the PAUSE frame does not
forward to anyone.
35
Flow Control (switch)
1. When the switch stops
sending frame to the client,
the frames are kept in the
switch buffer. As a result, it
causes a buffer overflow.
2. The switch sends the PAUSE
frame to the server when the
switch buffer overflows.
3. The server stops transmission.
36
Data Re-transmission
• How and when does the sender resume data
transmission?
• The PAUSE frame specifies the time to wait.
• After the time to wait, the sender resume transmission.
• The receiver can send a new PAUSE frame and reset the
timer.
• If the timer=0, the sender resume transmission
immediately.
• Many vendors suggest leaving this turned off.
37
Link Aggregation (802.3ad/ax)
100BaseTX
links
Speed = 4  100M = 400M bps
Normally, RSTP would block certain ports and only one
physical link is active.
In the case of link aggregation, all links are active and they are
bundled as a single logical link.
38
Link Aggregation
• Multiple physical links are combined to form a fat logical
link. Many vendors support four links, and some up to 8
links, i.e., 8 times the speed.
• It provides load balancing by divided data flow evenly
over different links.
• In the event of one link failure, it takes less than a second
to recover from it.
• Some NICs support Link Aggregation, allowing multiple
parallel links to a server.
• All packets associated with a given “conversation” are
transmitted on the same link to prevent mis-ordering
39
Link Aggregation Layers
40
How does Link Aggregation work?
Different data flows go to different physical ports where each flow
is identified by its source MAC address (default) or its destination
MAC address. Same flow goes to the same physical port.
STA-A
1000BaseT
4x100BaseTX
1000BaseT
STA-B
41
How does Link Aggregation work?
Different data flows go to different physical ports where each flow
is identified by its source MAC address (default) or its destination
MAC address. Same flow goes to the same physical port.
1000BaseT
4x100BaseTX
100BaseTX
What is the aggregated
throughput to/from the server?
100BaseTX
42
Link Aggregation
• Just because you are combining two 100 Mbps
links doesn’t mean you will get a 200 Mbps
aggregated link
• Link aggregation works well, but is not as good
as a fatter pipe
43
VLAN is a technology to resolve a
problem.
What is the PROBLEM that VLAN is
trying to address?
44
Collision Domain
One collision domain and two segments
hub
WS1
WS2
Segment 1
hub
WS3
WS4
Segment 2
45
Broadcast Domain
One broadcast domain and
two collision domains
bridge
hub
WS1
WS2
Collision Domain
1
hub
WS3
WS4
Collision Domain
2
46
Dividing a Broadcast Domain
(old way)
router
switch
WS1
WS2
Broadcast Domain
1
IP Subnet 1
switch
WS4
WS3
Broadcast Domain
2
IP Subnet 2
47
Dividing a Broadcast Domain
(new way: use switch instead of router)
switch
switch
WS1
WS2
VLAN 1
switch
WS4
WS3
VLAN 2
48
What is VLAN?
VLAN is a networking technology that divides a network
segment (broadcast domain) into multiple logical segments
without rewiring the hardware
VLAN-1
One broadcast domain
VLAN-2
VLAN-3
Multiple broadcast domains
49
VLAN Benefits
•
•
•
•
•
•
•
•
More bandwidth
No physical limitations
Broadcast and multicast containment
Flexibility
Ease of resource sharing
Performance
Quality of Service (QoS)
Security
50
How does VLAN work?
server1 WS11
VLAN-1
WS12
server2 WS21
WS22
VLAN-2
All stations are physically connected to the same switch, but:
WS21 and WS22 cannot access Server1.
WS11 and WS12 cannot access Server2.
51
MAC Forwarding Table
Each VLAN has its own MAC forwarding table.
P9
P1
MAC10
MAC1 MAC2
P2
P8
P3
P4
MAC20 MAC3
VLAN-1
P1 MAC1
P2 MAC2
P9 MAC10
MAC4
VLAN-2
P3 MAC3
P4 MAC4
P8 MAC20
52
VLAN Trunking Protocol (VTP)
• But what if you want to access one device
from both multiple VLANs using only one
port?
• You can use the VLAN Trunking Protocol
designed by Cisco and available in pretty much
all their routers
• VTP is a layer 2 protocol
53
VLAN Trunking
a physical port in multiple VLANs
Internet
VLAN 1
VLAN 2
VLAN 3
54
VLAN Trunking Application
shared server
trunk
WS11
WS12
VLAN-1
192.168.1.0
WS21
Server IP:
192.168.1.10
192.168.2.10
WS22
VLAN-2
192.168.2.0
Note: only ONE port into server. VLAN trunking allows you to share a
device using one port.
55
MAC Forwarding Table
Each VLAN has its own MAC forwarding table.
P1
MAC1 MAC2
P2
P8
MAC10
P3
P4
MAC3
MAC4
VLAN-1
VLAN-2
P1 MAC1
P2 MAC2
P8 MAC10
P3 MAC3
P4 MAC4
P8 MAC10
56
One-Armed Router
(inter-VLAN communication)
trunk
192.168.1.1
192.168.2.1
VLAN 1
192.168.1.10
VLAN 2
192.168.1.11
192.168.2.10
192.168.2.11
Normally, devices on the VLANs can not intercommunicate. Need the router to intercommunicate. But how can both VLANs access router? Use VTP.
57
VLAN on Multiple Switches
Switch 1
Switch 2
single physical link
WS11
WS21
WS12
WS22
When Switch 1 gets a frame from its end stations, switch1 knows the VLAN of the end
station (source) and knows how to forward the frame.
When Switch 2 gets a frame from Switch 1, how does Switch 2 know the VLAN of the
frame (destination)? You cannot assume a mapping between MAC address and VLAN.
Now what do we do?
58
VLAN Tagging
• IEEE 802.1Q standard (similar to Cisco’s VTP)
• Used for sharing a physical Ethernet link or
device by multiple logical networks
• A four-byte field is inserted into MAC frame
between source address and Type field
• This field is inserted by one switch and then
removed by another switch, so individual
workstations never see the tag
59
802.1Q Tagged Frame
• Tag Protocol Identifier (2 bytes) – contains the value hex
8100; identifies this frame as being a tagged frame
• User priority (3 bits) – indicates frame priority; values of
0 to 7; 0 means best fit, 1 is lowest priority, 7 is highest
• Canonical Format Indicator (1 bit) – 0 indicates noncanonical form (Ethernet), 1 indicates canonical (reversed
address) form (token ring)
• VLAN ID (12 bits) – specifies the VLAN to which the frame
belongs
• Some ISPs add a second tag to internal traffic
60
Tagged MAC Frame
61
VLAN Tagging
ingress switch
VLAN Tag added
by incoming
port
VLAN Tag
stripped by
forwarding port
Inter-Switch
Link carries
VLAN identifier
egress switch
62
VLAN Tagging (cont.)
ingress switch
VLAN Tag added
by incoming
port
egress switch
VLAN Tag
stripped by
forwarding port
tagged frames
63
In Class Discussion (A)
Is it always a one-to-one mapping
between VLAN and IP subnet?
Internet
VLAN trunk
192.168.1.254
VLAN 1
192.168.1.10/24
VLAN 2
192.168.1.11/24
192.168.1.101/24 192.168.1.102/24
Q1: is there any problem with this network configuration?
Q2: What is the solution to the problem?
64
In Class Discussion (B)
Is it always a one-to-one mapping
between VLAN and IP subnet?
no VLAN
configuration
192.168.1.10/24
Internet
192.168.1.254
192.168.2.254
192.168.1.11/24
192.168.2.10/24
192.168.2.11/24
Q: is there any problem with this network configuration?
65
Quality of Service (QoS)
DS
p0 DS
p7 VS
DS
VS
DS
VS
DS; data, priority = 0
VS: voice, priority ≠ 0
VS
VS
Different priority
queues for incoming
frames
66
Needs for QoS
• Voice traffic: sensitive to delay but less
sensitive to errors
• Data traffic: sensitive to errors but not
sensitive to delay
• Voice traffic should have higher priority
than data traffic.
• Video stream traffic: priority lower than
voice but higher than data.
67
802.1Q and 802.1p
3 bits for priority: how many queues?
68
Multiple Priority Queues in Switch
Best Effort (Data)
Gold Service (Data)
Voice Service
BS
BS
BS
BS
BS
GS
GS
GS
GS
2nd priority (p=001)
VS
VS
1st priority (p=100)
no priority (p=000)
if [frames in the 1st priority queue]
process Voice frames
else if [frames in the 2nd priority queue]
process Gold Service Data frames
else
process frames with Best Effort
69
Summary
Each standard represents a new technology which is
to address a problem. Describe the problem(s) and
the solution of each standard.
802.1D
802.1w
802.1s
802.3x
802.3ad
802.1Q
802.1p
problem/need
loop topology
slow fail-over time
no VLAN for STP
flow control for full
duplex
more bandwidth and
higher reliability
VLAN trunking
QoS
solution
tree topology
local decision for fast fail-over
per VLAN STP
PAUSE frame
aggregation of multiple
physical links
VLAN Tagging: VLAN ID
VLAN Tagging: priority bits
70