Challenges and Chances in Network Reliability
Download
Report
Transcript Challenges and Chances in Network Reliability
Zhaobo Zhang
Huawei Technologies (USA)
2014-09-11
Background of IP Network
System Reliability
Causes of unreliable network
Potential Directions
Fast growing
Primary source of
information sharing
& communication
Various applications
computers/mobile
device; ISP(regional,
backbones ); IXP
Data, voice, video
conferencing, P2P
High demands
QoS, reliability,
efficiency
Hundreds
Thousands
Millions
Billions
2010 Internet
The Opte Project
by Barrett Lyon
Seek to make an
accurate
representation of
the Internet using
visual graphics.
Metrics
Quality of service
connectivity, E2E delay, E2E packet loss rate
Network topology, service level agreement
Availability = MTBF/(MTBF+MTTR)
Mean Time Between Failure, Mean Time to Repair
e.g. 99.999%, means annual downtime 5.15 mins
Verification
Through fault insertion test and field data
IP connectivity errors
unstable transmission, overflow throughput, delay, network security
threat, IP resource management
Network mis-configuration
network topology loop, non-optimal path, duplex mismatch, protocol
unawareness
Software
version/patch conflict; Logic mis-configuration; device driver bugs,
Environment
Cable/fiber cut/device damage; electrical noise, power outage
Hardware: power/clock, logic aging, ram failure, soft error
Reliability-aware hardware design
Redundancy: RAM, link, NPU, board
Built in smart logic
Monitor misbehavior (e.g. delay increase), early alert
Monitor traffic, Balance traffic/heat to slow aging,
auto-reroute to avoid defective logic.
NPU NPU
RAM
NPU NPU
RAM
Smart
Orange colors are spares
Data mining & automated process
Learn history data, provide guidance for current/next
generation design, verification introduction, debug
R&D
Data
• Field-return data
• Field failure cases
O&M
• Design spec
• Verification list
• Fault database
• FIT result
• FMEA
CM
• Failure cases
• Test & component stats
Wikipedia: I know everything!
Google: I have everything!
Facebook: I know everybody
Internet: Without me you all nothing!
Electricity: keep talking bitches.
2% Global energy usage
Big data, big network, big infrastructure, BIG power
Power consumption control
Low power design
Dynamic control: sleep mode, turn off SerDes, MAC
Thermal control
Heat is an enemy of devices
every 10 degrees Celsius of temperature rise, the speed of
all chemical reactions doubles.
Fault tolerant control layer
design/testing
SDN & open flow
Application Layer
Business Application
Business Application
Business Application
Decouple network control and
forwarding functions
Directly programmable
network control
controller performs design
validation as part of
configuring the network
and that design validation
eliminates manual errors
SDN Control Layer
Network Service
Network Service
Network Service
Infrastructure Layer