Design and Availability

Download Report

Transcript Design and Availability

Reliable Network/Service Infrastructures
1
Availability, Reliability and Survivability
Availability
Reliability
Survivability
• The expected ratio of
the system uptime to
total elapsed time
• The probability of the
system keep being
available (not fail) over
certain period of time.
• The capability of the
system to continue its
operation and fulfill its
mission in a full or
limited scale during
failure
• Empirical factor
MTBF
A
MTBF  MTTR
• Empirical factor
R (t )  e  t

• Probabilistic
– Expected time between
failures
– Expected time to
recover
1
, t  time interval
MTBF
• Probabilistic
– Expected time between
failures
• Non-probabilistic
– Assumes explicit
failures of different span
and magnitude
2
What Is “High Availability”?
• The ability to define, achieve, and sustain “target availability
objectives” across services and/or technologies supported in
the network that align with the objectives of the business
(i.e. 99.9%, 99.99%, 99.999%)
Availability
Downtime per Year (24x7x365)
99.000%
3 Days
15 Hours
36 Minutes
99.500%
1 Day
19 Hours
48 Minutes
99.900%
8 Hours
46 Minutes
99.950%
4 Hours
23 Minutes
99.990%
53 Minutes
99.999%
5 Minutes
99.9999%
30 Seconds
3
Leading Causes of Downtime
• Change management
• Process consistency
Telco/ISP
35%
•
•
•
•
•
Communications
Links
Hardware Failure
Design
Environmental
issues
• Natural disasters
Power Failure
14%
Hardware
Failure
12%
Human Error
31%
Unresolved
8%
SOURCE: Graph Data: The Yankee Group, The Road to a Five Nines Network, Feb 2004.
4
Link/Circuit Diversity
THIS Is Better Than…
Enterprise
THIS, which Is Better Than…
Service
Provider
Network
Enterprise
But what is
beyond this???
THIS
Enterprise
5
Network Point of Presence/Data Center
• Cable management
• Power: Diversity/UPS
• HVAC
• Hardware placement
• Physical security
• Labeling
• Environmental control
systems
6
Network Design
Network Complexity
Technology Can Increase MTBF
People, Process, and Politics Can
Increase Complexity
THIS DECREASES MTBF and
Increases MTTR
7
Network Design
Primary Design Considerations
• Hierarchical
• Modular and consistent
• Scalable
• Manageable
• Reduced failure
• Domain (Layer II/III)
• Interoperability
• Performance
• Availability
• Security
8
Examples of Hardware Reliability
(Reliability Block Diagrams)
Hardware Reliability = 99.938% with 4 Hour MTTR (325 Minutes/Year)
Hardware Reliability = 99.961% with 4 Hour MTTR (204 Minutes/Year)
Hardware Reliability = 99.9999% with 4 Hour MTTR (30 Seconds/Year)
9
Network Availability Calculation
R1
R2
R3
R4
1
Router Availability R1, R2, R3 and R4
16000/(16000+24) = 0.9985
Can Include Hardware + Software
Components
Router R1, R2, R3 and R4
MTBF = 16000 Hours
MTTR = 24 Hours
3
Availability of R1, R2 in Parallel with R3, R4
= 1 - ((1-0.997)(1 - 0.997)) = 0.99999104
4
2
Availability of R1, R2 and R3, R4 in
Series = (0.99850.9985) = 0.997006
Network Availability = 99.999%
Only Base on Device Availability
Values; Link Availability Not Included
10
High Availability - Layered Approach
Application Level
Resiliency
Protocol Level
Resiliency
Global Server Load Balancing and positioning
Gateways, gatekeepers, SIP servers, DB servers
NSF/SSO,HSRP, VRRP, GLBP, IP Event Dampening ,
Graceful Restart (GR): BGP, ISIS, OSPF, EIGRP, OER, BGP
multipath, fast polling, MARP, incremental SPF
Transport/Link
Level Resiliency
Circuits, SONET APS, RPR, DWDM, Etherchannel,
802.1d, 802.1w, 802.1s, PVST+,Portfast, BPDU guard,
PagP, LacP,UDLD, Stackwise technology, PPP,
Device Level
Resiliency
Redundant Processors (RP), Switch Fabric, Line Cards,
Ports, Power, CoPP, ISSU, Config Rollback
11