Challenges and Chances in Network Reliability

Download Report

Transcript Challenges and Chances in Network Reliability

Zhaobo Zhang
Huawei Technologies (USA)
2014-09-11




Background of IP Network
System Reliability
Causes of unreliable network
Potential Directions

Fast growing



Primary source of
information sharing
& communication
Various applications


computers/mobile
device; ISP(regional,
backbones ); IXP
Data, voice, video
conferencing, P2P
High demands

QoS, reliability,
efficiency
Hundreds
Thousands
Millions
Billions
2010 Internet
The Opte Project
by Barrett Lyon
Seek to make an
accurate
representation of
the Internet using
visual graphics.

Metrics

Quality of service
 connectivity, E2E delay, E2E packet loss rate
Network topology, service level agreement
 Availability = MTBF/(MTBF+MTTR)

 Mean Time Between Failure, Mean Time to Repair
 e.g. 99.999%, means annual downtime 5.15 mins

Verification

Through fault insertion test and field data





IP connectivity errors
 unstable transmission, overflow throughput, delay, network security
threat, IP resource management
Network mis-configuration
 network topology loop, non-optimal path, duplex mismatch, protocol
unawareness
Software
 version/patch conflict; Logic mis-configuration; device driver bugs,
Environment
 Cable/fiber cut/device damage; electrical noise, power outage
Hardware: power/clock, logic aging, ram failure, soft error

Reliability-aware hardware design


Redundancy: RAM, link, NPU, board
Built in smart logic
 Monitor misbehavior (e.g. delay increase), early alert
 Monitor traffic, Balance traffic/heat to slow aging,
auto-reroute to avoid defective logic.
NPU NPU
RAM
NPU NPU
RAM
Smart
Orange colors are spares

Data mining & automated process

Learn history data, provide guidance for current/next
generation design, verification introduction, debug
R&D
Data
• Field-return data
• Field failure cases
O&M
• Design spec
• Verification list
• Fault database
• FIT result
• FMEA
CM
• Failure cases
• Test & component stats
Wikipedia: I know everything!
Google: I have everything!
Facebook: I know everybody
Internet: Without me you all nothing!
Electricity: keep talking bitches.


2% Global energy usage
Big data, big network, big infrastructure, BIG power
Power consumption control
Low power design
 Dynamic control: sleep mode, turn off SerDes, MAC


Thermal control
Heat is an enemy of devices
 every 10 degrees Celsius of temperature rise, the speed of
all chemical reactions doubles.


Fault tolerant control layer
design/testing

SDN & open flow
Application Layer
Business Application
Business Application
Business Application
 Decouple network control and
forwarding functions
 Directly programmable
network control

controller performs design
validation as part of
configuring the network
and that design validation
eliminates manual errors
SDN Control Layer
Network Service
Network Service
Network Service
Infrastructure Layer