3-eng-tanaka-hawaii

Download Report

Transcript 3-eng-tanaka-hawaii

Global Observatory for Advanced
Network Operations
APAN Hawaii meeting, January 2004
Yoshinori Kitatsuji, Jin Tanaka &
Kazunori Konishi
APAN Tokyo XP
Outline
Background
Lessons learned from high performance
experiments
Comparison between APAN JP NOC pages
and Abilene NOC/Observatory
Discussion on Global Observatory
Background
High performance demonstration become to be held constantly through a year.
Some demonstrations begins to be done without notifications.
Know-how for provisioning and troubleshoot has been accumulated by network
engineers.
Operators struggle with deployment of new service and technology.
Measurement for new items such as IGP stability
Performance experiment such as SLAC, etc.
Conventional services should be maintained
Needs to share high technology and know-how to operators toward new
services with higher technology.
Introduction of new tools with advanced functions enables operators to tackle new
service easily.
View of collected data leads to the smooth and stable operation at a result.
Experiences of High Performance
Experiment Support
Osaka University
HD over IPv6 transmission to NPACI, SC2003 and OptIPuter
• Osaka Univ., JGNv6, Tokyo XP, Abilene and SCInet/SDSC, 100+ Mbit/s
University of Washington
HD over IP transmission at APAN Busan meeting, Aug 2003
• Tyco DC, IEEAF-WIDE, Tokyo XP, JGN, Genkai XP and KOREN, 200+ Mbit/s
National Institute of Advanced Industrial Science and Technology Japan
SC2003 Bandwidth Challenge
• Tokyo XP/SINET Abilene and SCInet, 3.8Gbit/s
University Tokyo
SC2003 Bandwidth Challenge
• Tokyo Univ., Tokyo XP, WIDE, Tyco DC(Seattle), NTT Communications, Abilene, SCInet,
7.8Gbit/s
SURFNET/TransLIGHT/APAN Tokyo XP
Performance test has just started.
Bottleneck is 1Gbit/s between Japan and Netherlands.
Lessons Learned from Experiments
Importance of understanding the availability on the path
Performance test should be performed hop by hop.
Resource sharing technique is required.
Effects of multiple TCP connections and rate control at source stations.
Dynamic rate control mechanism is one of the next key technologies items.
VLAN ID assignment policy applied to HPRENs connections is not
discussed yet.
It is anticipated that connections among three continentals accelerates
VLAN ID consumption.
A VLAN for Tokyo XP/TransLIGHT/SURFnet was already been assigned.
Importance of management of used ports and assigned VLAN was
pointed out
Much time was spent to establish the VLAN between Tokyo and
Netherlands.
It’s seems that few people can administrate VLAN and networks
connected are not clarified.
Comparison between APAN-JP NOC page
and Abilene NOC/Observatory-1
Network Monitoring
Function
APAN-JP NOC
Abilene
NOC/Observatory
Our Impression
Useful for
Trouble
shooting
Useful for
operation
View current alerts on network
-
Alertmon Network Monitor
☆☆☆
☆
View geographical network
usage data
-
Animated Traffic Map
☆☆
☆☆
Show traffic graph of router
interface
All Links Traffic Graphs by
MRTG
RRD Connector Graphs (5minute avg)
☆☆
☆☆
Show high-performance traffic
graph of router interface
http://mrtg.jp.apan.net/cricket/rout
er-interfaces/ & Traffic Report by
RRD
RRD Connector Graphs (1minute avg)
☆☆☆
☆☆☆
Aggregate traffic graphs of the
whole network or routers
-
Aggregate Traffic on Abilene
☆
☆☆
View router CPU utilization
CPU Utilization by MRTG
Contained in Visible Network
Toolset
☆☆☆
☆☆
View router Memory utilization
Memory Utilization by MRTG
Contained in Visible Network
Toolset
☆☆
☆☆
View router Temperature
measurement
Temperature Measure by MRTG
Contained in Visible Network
Toolset
☆
☆☆
Comparison between APAN-JP NOC page
and Abilene NOC/Observatory-2
Network Monitoring
Function
APAN-JP NOC
Abilene
NOC/Observatory
Our Impression
Useful for
Trouble
shooting
Useful for
operation
Collect & analyze XML Network
data
-
Visible Network Toolset
☆☆☆
☆☆☆
Archive text reports of daily
aggregate statistics
-
Daily Connector Statistics
☆☆
☆☆
View recent outage and availability
reports
-
Weekly Reports
☆☆
☆☆☆
View daily reports generated
from the Netflow data collected
from routers
-
Netflow Reports
☆☆☆
☆☆
Animated Traffic Map
☆☆
☆☆
Ixia IxTraffic BGP tables
☆☆
☆☆
View errors and discards on the
link
Error Packets & Discard Frame
Show daily snapshots of the BGP
table and BGP events
-
Show time-series graph of the
number of BGP routes per peer
Number of BGP routes
-
☆☆☆
☆☆
View the VLAN information
VLAN View
-
☆☆
☆☆☆
View traffic of the network displayed on the map
Excels in grasping the traffic Usage of the whole network. APAN JP is considering the introduction of
this tool in the near future.
Requirements
• Hardware: Traffic Graph Server (Ready)
• Library: GD Library (Ready)
• Software: http://tseg.uits.iu.edu/dist/wxmap/index.html
Collect and analyze XML Network data
Useful for remote trouble shooting from the other networks.
Requirements
• Know-how of JUNOScript
• Hardware: WWW Server with huge hard disk (Ready)
• Software: http://sourceforge.net/projects/visiblebackbone/
View daily reports generated from the Netflow data collected from routers
Useful for routing troubleshoot, traffic analysis and DoS detection. APAN JP is considering the
realization with cflow.
Requirements
• Software: cflowd http://www.caida.org/tools/measurement/cflowd/
• Router configuration
Comparison between APAN-JP NOC page
and Abilene NOC/Observatory-3
Operation Tools
Function
APAN-JP NOC
Abilene
NOC/Observatory
Get the result of operational
commands to a router from the
web page
Traceroute Service
Visualize the tree
structure of a mroute
Operation manual searching
engine
Show the DC power usage
polled from power controllers
A necessary manual can be
searched with a keyword from
about 150 manuals (secure page)
-
Our Impression
Useful for
Trouble
shooting
Useful for
operation
Core Node Router Proxy
☆☆☆
☆☆☆
Multicast Route Viewer
☆☆
☆☆
☆☆☆
☆☆
☆☆
☆☆☆
?
Rack Power Draws
Ticket system for managing
problem and maintenance
Request Tracker (secure page)
Helpdesk System
☆
☆☆☆
Check the allocation of IP
address
Automatically check by icmp
(secure page)
?
☆
☆☆
Get the result of operational commands to a router from the web page
It allows APAN participants to execute “show” commands on multiple routers in APAN
Tokyo XP via a web interface.
Enable to conduct more complete diagnosis of a problem without contacting NOC.
Requirements
• Hardware: WWW server (Ready)
• Software: routeproxy http://sourceforge.net/projects/routerproxy/
• Router configuration
Show the DC power usage polled from power controllers
Enable to show an overview of power usage in rack.
No need to measure the power usage of each rack on the regular schedule.
Requirements
• Hardware: WWW Server (Ready), SNMP responseable AC/DC power controller
• Software: NET-SNMP (Ready)
Comparison between APAN-JP NOC page
and Abilene NOC/Observatory-4
Advanced Services
Function
APAN-JP NOC
Abilene
NOC/Observatory
Our Impression
Useful for
Trouble
shooting
Useful for
operation
Monitor various aspects of
multicast at the router level
-
MANTRA Multicast
Monitoring
☆☆
☆☆☆
Test multicast connectivity
between multiple hosts
-
Multicast Beacon Server
☆☆☆
☆☆☆
Display the amount of IPv6
traffic to and from the tunnel
router
-
IPv6 Traffic Graphs
☆☆
☆☆
Determine the Provider
Independent (PI) address based
on the location
-
Provider Independent
Addressing
☆
☆☆
Test of multicast connectivity between multiple hosts
Provides measurement data for the current multicast traffic in a group by RTP.
Requirements
• Hardware: WWW Server
• Perl Module: Net-RTP-0.4, Net::Domain
• Software: Multicast Beacon http://dast.nlanr.net/Projects/Beacon/index.html
Comparison between APAN-JP NOC page
and Abilene NOC/Observatory-5
Observatory
Function
APAN-JP NOC
Provide one way latency
information for each link
Abilene
NOC/Observatory
Our Impression
Useful for
Trouble
shooting
Useful for
operation
-
Abilene Latency Tables
☆☆☆
☆☆
Provide one way latency statistics
-
Abilene Worst Ten
Performing Latency
Measurements
☆☆
☆☆
Provide Throughput
information for each link
-
Abilene Throughput Tables
☆☆☆
☆☆☆
-
Abilene Worst Ten
Performing Throughput
Measurements
☆☆
☆☆
-
Connection Technologies
Table
☆☆
☆☆
Stratum 2 Server
☆
☆☆☆
Provide Throughput statistics
Provide Router Node Data
NTP Service
Stratum 2 Server
Provide one way latency information for each link
Network performance of each link can be checked
Vital when trouble-shooting delay-related problems
Requirements
• NTP on endpoints
• Hardware: FreeBSD or Linux Server with OWAMP (One-way Active Measurement
Protocol)
• OWAMP: http://e2epi.internet2.edu/owamp/download/
Provide Throughput Information for each link
Draw the throughput statistics and ranking from the data which Iperf reported.
Throughput (Mbps) / Jitter (%) / Packet Loss (%), Date & Time
Requrements
• Hardware: WWW Server (Ready)
• Software: Iperl http://dast.nlanr.net/Projects/Iperf/
Discussion: Global Observatory for
Advanced Researches over HPRENs
We would like to work for a program challenge that
supports the collection and dissemination of network data
over HPRENs.
With observatory servers such as Abilene Observatory, NOC
provides network engineers a view of the operational data
associated with a global-scale network, and also research
communities the fundamental properties of basic network
protocols.
With provide rack space and fundamental operation service by
NOC, Advanced research projects can collect data to be opened to
the other researchers. NOC will provide an account in
measurement devices and the ways to export data.
NOC manages resources for the operations, advanced researches
and the conventional services.
Issues towards global operations and/or
sharing measurement data among HPRENs
Non-uniform networks are connected.
High load performance test should be considered.
Long Distance
Long latency.
Time synchronization accuracy tends to be NTP level.
Time difference
Burden of Troubleshoot depends on how much management information are
opened to the community.
Troubleshoot escalation level for nighttime operators should be specified at every
site.
Scalability
Automated operations of the tools will be more important, because the number of
measurement or monitoring tools will be increased.
Software update methods and software version management will also be more
troublesome.
Requirements for Global
Observatory
Basic data collection.
The comparison between APAN-JP NOC and Abilene NOC/Observatory is
described in the front pages.
Advanced features to be developed for advanced researchers.
TCP performance test scheduler
High resolution flow based traffic grapher for TCP/UDP performance test
Preparations to accept an applications of co-located project
Observatory Servers
• Cooperate with Abilene Observatory project.
Preparation for Co-Located project
• Available rack space & power and account in related routers and switches.
• Application: form, organizing evaluation team.
Drafting Acceptable Co-location Policy (Example)
• Accounts should be provided to NOC operators and engineers.
• Permission of exporting date to specified researchers.
TCP Performance Test Scheduler
Increasing TCP high performance transmission experiment
from multiple group or organization.
SLAC, CRL, NASA, Indiana University, Demonstrations
Experiments are getting congested.
Some experiments require original TCP stack
Dedicated machine is required for this kind of experiments.
Traffic generated by each experiment can easily fill up the
link between the end hosts.
Multiple experiments running at the same time degrade the
performance.
Architecture of TCP Performance
Scheduler
1.
R
Host B
R
Performance machine
with original TCP stack
(iperf, netperf, etc)
R
2.
R
3.
Iperf test path
6. Run performance test
4.
1. Collect data to recognize
topology tree at Scheduler5.
Host A
6.
Scheduler
At beginning, Scheduler collects topology
data by SNMP and routing information
base from routers by OSPF SLAs and
recognizes topology tree.
Run iperf wrapper program to send a ticket
to Scheduler for running the performance
test.
Scheduler checks the queuing tickets and
responds when test can start. At this stage
tickets performance time is registered as a
reservation.
If the start time is accepted, the host
responds whether to run or quit.
Scheduler update a reservation ticket based
on wrapper program response. If wrapper
program requests to continue, complete
ticket registration.
Wrapper program run iperf test.
5. Register reserving tickets
for test in the future or discard
High Resolution Flow Base
Traffic Grapher
Why Flow measurement?
Performance test wants traffic viewers shows the traffic variation based
on flows at intermediate routers.
Data collection by Netflow/Cflowd shows traffic tendency because 1/100
- 1/1000 packet sampling analysis is done on high speed routers in stead
of full packet checking. It’s not enough!
Why High resolutions?
It’s easy for users to get average speed of TCP performance test at end
stations, but hard to understand variation of speed in short period such as
millisecond granularity.
Control of variable burst traffic might help to troubleshoot TCP
performance degradation.
AIST and University of Tokyo was awarded in SC2003 bandwidth
challenge.
Multiple flows should be measured at the same time over multiple
links.
Measurement performance should be investigated.
Requirements of High Resolution
Flow Base Traffic Grapher
Devices will be installed on multiple links
User and operator want to compare the performance over multiple
paths
Time synchronization between the devices.
Topology recognition is important.
Operation of the multiple devices is the key.
On demand measurement
To respond for the requests from multiple users without high
burden, measurement should be short period (Ex. 10 second or 1
minute graphs)
Gigabit ethernet will be ready soon, but SONET/SDH…
Development with OC48/OC192MON equipping DAG card sold
by Endace.
Architecture of High Resolution
Flow Base Traffic Grapher
Measurement
device
1.
R
Host B
R
Performance machine
with original TCP stack
(iperf, netperf, etc)
R
2.
Iperf test path
R
3. & 4. Request Measuremet
3.
device to measure and
1. Collect data to recognize
respond traffic graph
topology tree at Scheduler
Host A
4.
5. Show graphes
on browser
2. Request Traffic
graphes via http
5.
At beginning, Manager program to
control measurement devices collects
topology data by SNMP and routing
information base from routers by OSPF
SLAs. Manager recognizes topology
tree.
User requests the measurement to
Manager via WEB interface by giving
start and end of path, port numbers,
start time and period.
Manager analyzes topology from path
information and request Measurement
devices to start to measure flows at the
specified time.
Measurement Devices responds traffic
flow graph to Manager.
User see the traffic graph page
constructed by Manager, via the WEB
browser.