PowerPoint 簡報

Download Report

Transcript PowerPoint 簡報

Design and Implementation
of TWAREN Hybrid Network
Management System
National Center for High-Performance Computing
Speaker: Ming-Chang Liang & Li-Chi Ku
1
Outline
 Introduction
 Motivation
 Issues
 Design
 Implementation
 Future works
2
INTRODUCTION
3
About TWAREN
 TWAREN (TaiWan Advanced Research &
Education Network) network construction was
completed at the end of 2003 and started its
operation and service in the beginning of 2004.
 In its initial phase, IP routing was the main
service provided.
 The network management programs coming
along with the purchase of network equipments,
including CIC, Webtop, CW2K, HP Openview,
HP NNM and other solutions.
4
Initial phase of TWAREN
MOECC
NTU
NCCU
C6509
C6509
C6509
10GE
STM-64/OC-192
STM-16/OC- 48
GE
ASCC
C7609
NDHU
Taipei
C6509
C6509
GSR
NCU
EBT10GE
CCU
C6509
NHLTC
TWAREN
C6509
GSR
C6509
C6509
GSR
NCU
Tainan
Hsinchu
NCTU
GSR
NTHU
C6509
C6509
C6509
NCHU
Taichung
NTTU
C6509
NYSU
5
Initial phase of NMS
WebTop
Remedy
Help Desk
CLI
Notification
Gateway
API
SMTP HTTP
FTP
DNS
CLI
Cisco Info
Center
ISM
Probe
Trap
CW2K
(DFM)
NNM
CTM
Trap
Trap
PING
Polling
NAM
12416
Trap
7609
PING
Trap
Polling
3750
2522
2600
PING
Polling
15454
15600
6
Phase 2 of TWAREN





TWAREN was adapted for more protection methods and
better availability at the end of 2006, called TWAREN
phase 2.
Tens of optical switches and hundreds of lightpaths were
then served as the foundation of the layer 2 VLAN
services and the layer 3 IP routing services.
In 2008, tens of VPLS switches were further incorporated
to provide additional Multi-point VPLS VPN service.
The layer 1 lightpaths can be protected by SNCP, layer 2
VLAN by spanning tree recalculation and layer 2 VPLS
by fast reroute technology.
All these improvements transform TWAREN phase 2 into
a true hybrid network capable of providing multiple
layers of services and high availability .
7
Architecture of TWAREN phase 2
NCCU
ASCC
NTU
NIU
15454
6509
7609
7609
7609
6509
15454
15454
6509
15454
6509
NDHU
3750
7609
15600
NCU 15454
12816
7609
12816
15454
MOEcc
NCNU
7609C
NHLTC
Taipei
7609
12816
12816
3750
6509
NCTU 15454
15600
15454
7609C
12816
NCHC
Hsinchu Taichung
6509
15454
7609C
7609
NCHC
12816
6509
Tainan
15454
NCHC
NCHU
7609C
7609
NTTU
6509
12816
15454
12816
NTHU 15454
3750
6509
15600
7609
15454
6509
7609
NSYSU
15454
6509
7609
NCKU
15454
6509
7609
CCU
STM64
STM16
10GE
GE
8
MOTIVATION
9
Why need new NMS?
 The architecture of TWAREN phase 2
became more and more complicated.
 Since TWAREN phase 2 has more protection
methods, a single point of hardware or circuit
failure will not interrupt the service level
provided to the end users.
 The initial phase of NMS was no longer
competent for the hybrid network anymore
because it is hard to determine and predict
the correlation between failures and affected
services.
10
Requirements for new NMS
 Automatically determine the correlation
between failures, affected services, affected
customs and severity level on this highly
safeguard network.
 Provide single integrated visual user interface.
 Use integrated database, logs, message flows
and exchange protocols.
 After several surveys, we decided to develop
a new NMS which be suitable for monitoring
all services provided by TWAREN phase 2.
11
ISSUES
12
Uncertainty of SNMP implementation
 There are some different implementations
of the SNMP TRAP/MIB among
equipments of same brand.
 The SNMP OIDs or the return values may
vary between OS upgrade on the same
equipment and are usually hard to reveal
beforehand.
 Therefore, the system must be designed in
a way such that these changes can be
accommodated with minimal
modifications.
13
The lack of skillful programmers
 Our programmers are the same guys with
the members of operating team.
 We are not professional programmers and
have not accordant programming language.
 The system must be partially available and
operational during the early phase of its
development such that it can evolve along
with the real needs.
 So, an unified standard of communication
between different modules is necessary
14
Huge historical data and computing
 For minimizing the false positive and
false negative rate, baseline thresholds
would have much better quality when
they are dynamically generated from
historical data.
 Therefore, we need to store
sufficiently large historical data sets
and to have very high efficiency to
retrieve the data back while
calculating those thresholds.
15
Automatically determine affected
services and customs
 TWAREN phase 2 inherently has the
ability to guard against a single point
of hardware or circuit failure, so the
failure is less likely to affect the actual
service provisioning.
 An intelligent management system
which is able to determine the scope
of failure affected service will reduce
the management cost.
16
DESIGN
17
1st Stage System Architecture
Monitor Objs
Control API
Traps
GUI &
Ticket System
Fault Detection
Data Collectors
Fault Location
MIBs
Syslogs
Current Status
DB
Threshold
DB
Net flows
Telnet/SSH
Long Term
DB
Case/Action
DB
TL1
Mirror
Interactive
Auto Action
Threshold Analyzer
Report System
Passive
18
Relationship of Data Tables
Basic Data Tables
Relationship Tables
Component
Circuit
People
VLAN Services
Location
VPLS Services
Unit
ONS
Light Path
Vendor
ONS
Cross Connection
…., etc
…., etc
19
Basic Data Tables
Component Data Table
Component_ID
Parent_C_ID
Name
1
0
TN7609P
ID
Name
12
1
Slot_1
1
CHT
2
0
TP15454
2
APBT
16
2
Slot_3
3
RingLine
135
12
Port_9
Vendor Data Table
People Data Table
ID
Name
Phone
Address
Service_Time
Service_WeekDay
1
John
0939123123
xxxxxxx
8-17
1,3,5
2
Mary
0958123123
xxxxxxx
ALL
ALL
Location Data Table
Unit Data Table
ID
Name
Address
ID
Name
1
MOEcc
xxxxx
1
NCKU
2
NTU
xxxxx
18
THU
20
Relationship Data Tables
Circuit Data Table
ID
Name
Vendor
Identify
From_CID
To_CID
Bandwidth
1
Taipei_Tainan_STM64
1
8D543267
13
35
STM64
2
NCHU_NCNU_10GE
2
ST16987
23
67
10GE
ONS Topology Link Table
ONS Light Path Table
NodeA
NodeB
PortA
PortB
LP
PortFrom
PortTo
SNCP_LP
CRS_Trace
Size
12
45
1467
2346
2
2312
2345
0
359,556,522,475
4
16
32
2312
3421
98
3434
4455
99
482,541,335
16
99
3434
4455
98
482,469,541,335
16
ONS Cross Connection Table
CRS
PortA
PortB
SNCP_CRS
ChannelA
ChannelB
Size
482
1744
1756
0
5
13
4
21
3321
3343
24
17
33
16
24
3546
4534
21
1
17
16
21
IMPLEMENTATION
22
Current monitor objects











Trap monitor
 Used interfaces, BGP, etc.
Environment of equipment room
 Temperature (auto threshold), Voltage
Statuses of equipments
 Temperature , CPU, RAM, FANs, Power-Supply
BGP peering with other networks
 Statuses, Number of exchanged routes (auto threshold), Utilization analysis
Performance monitor
 End to End RTT (auto threshold), End to End Packet Lost Rate (auto
threshold), End to End Availability
Throughput
 Backbone (auto threshold), Designate interfaces
Top N
 Bytes, Flows, Packets
Routes monitor
 The routes of customs (exact comparison)
VPLS VPN
 Throughput of CE side, MACs of VPN
Optical Network
 Current topology of lightpaths
VLAN
 Current topology of VLAN
23
Future works
 Combine all developed monitor objects
with single integrated visual user
interface.
 Enhance the monitoring of optical,
VPLS and VLAN networks.
 Automatically determine the fault
location, root cause and affected scope.
 Minimize the false positive and false
negative rate.
24