Design and Availability

Transcript Design and Availability

Reliable Network/Service Infrastructures
1
Availability, Reliability and Survivability
Availability
Reliability
Survivability
• The expected ratio of
the system uptime to
total elapsed time
• The probability of the
system keep being
available (not fail) over
certain period of time.
• The capability of the
system to continue its
operation and fulfill its
mission in a full or
limited scale during
failure
• Empirical factor
MTBF
A
MTBF  MTTR
• Empirical factor
R (t )  e  t

• Probabilistic
– Expected time between
failures
– Expected time to
recover
1
, t  time interval
MTBF
• Probabilistic
– Expected time between
failures
• Non-probabilistic
– Assumes explicit
failures of different span
and magnitude
2
What Is “High Availability”?
• The ability to define, achieve, and sustain “target availability
objectives” across services and/or technologies supported in
the network that align with the objectives of the business
(i.e. 99.9%, 99.99%, 99.999%)
Availability
Downtime per Year (24x7x365)
99.000%
3 Days
15 Hours
36 Minutes
99.500%
1 Day
19 Hours
48 Minutes
99.900%
8 Hours
46 Minutes
99.950%
4 Hours
23 Minutes
99.990%
53 Minutes
99.999%
5 Minutes
99.9999%
30 Seconds
3
Leading Causes of Downtime
• Change management
• Process consistency
Telco/ISP
35%
•
•
•
•
•
Communications
Links
Hardware Failure
Design
Environmental
issues
• Natural disasters
Power Failure
14%
Hardware
Failure
12%
Human Error
31%
Unresolved
8%
SOURCE: Graph Data: The Yankee Group, The Road to a Five Nines Network, Feb 2004.
4
Link/Circuit Diversity
THIS Is Better Than…
Enterprise
THIS, which Is Better Than…
Service
Provider
Network
Enterprise
But what is
beyond this???
THIS
Enterprise
5
Network Point of Presence/Data Center
• Cable management
• Power: Diversity/UPS
• HVAC
• Hardware placement
• Physical security
• Labeling
• Environmental control
systems
6
Network Design
Network Complexity
Technology Can Increase MTBF
People, Process, and Politics Can
Increase Complexity
THIS DECREASES MTBF and
Increases MTTR
7
Network Design
Primary Design Considerations
• Hierarchical
• Modular and consistent
• Scalable
• Manageable
• Reduced failure
• Domain (Layer II/III)
• Interoperability
• Performance
• Availability
• Security
8
Examples of Hardware Reliability
(Reliability Block Diagrams)
Hardware Reliability = 99.938% with 4 Hour MTTR (325 Minutes/Year)
Hardware Reliability = 99.961% with 4 Hour MTTR (204 Minutes/Year)
Hardware Reliability = 99.9999% with 4 Hour MTTR (30 Seconds/Year)
9
Network Availability Calculation
R1
R2
R3
R4
1
Router Availability R1, R2, R3 and R4
16000/(16000+24) = 0.9985
Can Include Hardware + Software
Components
Router R1, R2, R3 and R4
MTBF = 16000 Hours
MTTR = 24 Hours
3
Availability of R1, R2 in Parallel with R3, R4
= 1 - ((1-0.997)(1 - 0.997)) = 0.99999104
4
2
Availability of R1, R2 and R3, R4 in
Series = (0.99850.9985) = 0.997006
Network Availability = 99.999%
Only Base on Device Availability
Values; Link Availability Not Included
10
High Availability - Layered Approach
Application Level
Resiliency
Protocol Level
Resiliency
Global Server Load Balancing and positioning
Gateways, gatekeepers, SIP servers, DB servers
NSF/SSO,HSRP, VRRP, GLBP, IP Event Dampening ,
Graceful Restart (GR): BGP, ISIS, OSPF, EIGRP, OER, BGP
multipath, fast polling, MARP, incremental SPF
Transport/Link
Level Resiliency
Circuits, SONET APS, RPR, DWDM, Etherchannel,
802.1d, 802.1w, 802.1s, PVST+,Portfast, BPDU guard,
PagP, LacP,UDLD, Stackwise technology, PPP,
Device Level
Resiliency
Redundant Processors (RP), Switch Fabric, Line Cards,
Ports, Power, CoPP, ISSU, Config Rollback
11

Design and Availability

Transcript Design and Availability

Directory