Transcript 第四章网络测量和监控
《计算机网络管理》
主讲教师:王继龙
清华大学信息网络工程研究中心
[email protected]
第四章
网络测量和监控
第一节
网络测量技术综述
第二节
网络测量技术专题
第三节 网络测量系统举例
[email protected]
2
科技奥运信息网络蓝图
第三节
网络测量系统举例
Organizations
Measurement Classification
Measurement Projects Overview
比较
[email protected]
4
一、 Organizations
IETF
- IP Performance Metrics (IPPM)
- Realtime Traffic Flow Measurement (RTFM)
NLANR
- Measurement & Operations Analysis Team
- National Center for Network Engineering
CAIDA
[email protected]
5
IETF IPPM Working Group
Develop metrics for wide-area network
measurement
IPPM: Metrics
-
Framework: RFC2330
Connectivity
One-way delay and loss
Round-trip delay and loss
Flow capacity/Bulk transfer capacity
Delay variation (jitter)
http://www.advanced.org/IPPM/
[email protected]
6
RTFM
Measure network flows through passive
measurments
Standards for
- architecture; metrics for traffic flows
- language to describe flows
- MIB to control meter & gather measurements
Origin in accounting
http://www.auckland.ac.nz/net/Internet/rtfm
[email protected]
7
NLANR
NLANR(1995-1997):
- concerted effort across NSF funded supercomputing centers, with
focus on the vBNS and the HPC community
NLANR (1998):
- continued collaborative and expanded effort across three parties:
• Distributed Applications Support Team (DAST at UIUC/NCSA)
• Measurement and Operations Analysis Team (MOAT at
UCSD/SDSC)
• National Center for Network Engineering (NCNE at
CMU/PSC)
Web server:http://www.nlanr.net
Offspring focused on commercial market:CAIDA
(http://www.caida.org)
10/14/2002
[email protected]
8
CAIDA (www.caida.org)
Cooperative Association for Internet Data
Analysis
Based at Univ of California (SDSC)
NSF and Commercial Funding (Cisco, ANS)
Clearinghouse, coordinating body for network
statistics
Tool Development
- Network Visualization
- Multicast statistics
- Packet traces
[email protected]
9
二、Measurement Classification
Utilization [usually SNMP]
Active measurements
- ~Performance (one-way delay, loss, throughput)
Passive measurements
- ~Traffic characterization
Routing [usually BGP/traceroute]
[email protected]
10
测量项目
静态配置信息:SNMP
网络拓扑和路由
故障发现和定位
性能(效能):延迟、丢包、利用率
安全:弱点、热点、防御、追踪、取证、内容
使用:时间、带宽、字节、应用、服务
行为:时间规律、空间规律、内容规律、“品行
”规律
[email protected]
11
测量方法
主动式测量方法
- 利用网络对“探测包”的反映来测量 ping、Trace
- 通过网络对“探测包”的服务情况来实现测量treno测量
吞吐率
- 可能会产生“Heisenberg”效应,即由于“探测包”影响
了网络的性能,从而影响了测量结果的客观性
被动式测量方法 :不可感知
[email protected]
12
测量方法
Router-Based
- 端口流量(端到端流量)
Router-Aided
- 拓扑测量
Stand alone
- 时延(时钟同步)、拓扑(时延拓扑推测、丢包拓扑推测)、
性能推测(已知拓扑)、带宽测量(线路带宽、可用带宽、瓶
颈带宽)、网络距离、路由器参数推测(调度器类型和参数、
瓶颈节点缓存大小和策略)
[email protected]
13
Tools: Routing
Traceroute servers
IPMA tools (based on BGP peering)
Skitter: tomography (CAIDA)
[email protected]
14
Tools: Active performance
Treno
mping (windowed ping): invasive
pathchar: estimate router Qs
bottleneck bandwidth
traditional: ping, ftp, ttcp, netperf
[email protected]
15
Active Measurement efforts
AMP by NLANR MOAT
- Available to HPC/I2 sites
- See http://amp.nlanr.net/ for more information.
- Measures connectivity, loss and roundtrip time.
- Performance matrix between sites.
Surveyor by Advanced.org
- Available to HPC/I2 sites
- See http://www.advanced.org/csg-ippm/
- Measures one-way delay and packet loss.
- Performance matrix between sites.
- Requires GPS antenna.
[email protected]
16
Tools: Passive
NeTraMet: Nevil Brownlee’s RTFM
IBM Research’s RTFM
tcpdump
OCxMON: passive optical
Coral
~Cisco’s NetFlow, and
cflowd (CAIDA/ANS)
[email protected]
17
Tools: Utilization
Mainly commercial? Network management tools
(e.g., OpenView)
MRTG
NetScarf?
[email protected]
18
三、Measurement Projects
Overview
IPMA
Surveyor
pingER
NIMI
NLANR/MOAT NAI
CAIDA
CERNET
WINDMILL
Commercial
[email protected]
19
1。IPMA
University of Michigan and
Merit Network, Inc.
http://www.merit.edu/ipma
Major support from the National Science Foundation,
Intel, Hewlet Packard and the Merit RSNG Project
[email protected]
20
Missions
Probe machines at IXPs and inside backbones
- Focus routing
- Some latency/loss
- Visualization and data-mining
Real-time inter-domain problem diagnostics
Tool Development
http://www.merit.edu/ipma
[email protected]
21
Internet Measurement
Probe
Probe
Probe
Probe
Probe
Data
Dissemination
Data
Dissemination
AS2
Data
Dissemination
AS1
Public
[email protected]
22
Probes
Probe machines at major US Internet Exchange
Points
[email protected]
23
Data Dissemination
Probe
Machine
Data Dissemination
Server
Java
Client
Probe
Machine
Email
Report
Web
Pages
Database
[email protected]
Report
Generator
24
IPMA Tools
IPN -- provider outages and maintenance
ASExplorer -- routing instability and topology
NetNow -- network packet loss latency
NetGrapher -- graphs of various network
statistics
RouteTracker -- collects routing stability
TPD -- Monitors topology via traceroutes
SNMPTrackers -- monitors SNMP MIBs
[email protected]
25
IPMA: IPN
[email protected]
26
IPMA: FlapGraph
[email protected]
27
ASExplorer
[email protected]
28
BGP Routing Analysis
Most BGP traffic is pathological
- Up to 50 million BGP updates/day at Mae-East!
- Duplicate withdraws
- Duplicate Announcements
Vendor implementation problems
Strong correlation to network usage
[email protected]
29
AADS BGP Routing Updates (3/17/96 - 3/1798)
12,000,000
8,000,000
with
6,000,000
ann
4,000,000
2,000,000
3/17/98
1/17/98
11/17/97
9/17/97
7/17/97
5/17/97
3/17/97
1/17/97
11/17/96
9/17/96
7/17/96
5/17/96
0
3/17/96
Number of Updates
10,000,000
Date
[email protected]
30
OSPF Observations
Significantly more OSPF LSA changes than
anticipated (order of magnitude?)
Identified several hardware and software
pathologies
[email protected]
31
RSLA Interface Changes 4/98-6/98
198.110.145.41 LTUPOP
198.109.193.5 JACKSON
198.111.3.5 OAKLAND
500
198.108.91.5 MICHNET5
198.108.89.45 EMU
450
198.108.195.5 TCITY
198.109.134.33 STATEMICH
400
198.109.133.5 MSU
198.110.209.25 LSSU
350
198.108.131.5 WMU
198.110.9.5 FLINT
300
198.110.69.5 GRPOP
198.109.133.169 VOYAGER2
250
198.109.39.5 UMD
198.110.39.5 MUSKPOP
200
207.74.188.1 0
198.108.247.5 IRONMT
150
198.109.37.5 WSU1
198.111.129.5 CMU
100
198.109.225.5 BSPOP
198.110.145.49 TACOM
50
198.108.90.5 MICHNET1
198.110.18.5 FLPOP
0
4/
1/
9
4/ 8
3/
9
4/ 8
5/
9
4/ 8
7/
9
4/ 8
9/
4/ 98
11
/
4/ 98
13
/
4/ 98
15
/
4/ 98
17
/
4/ 98
19
/
4/ 98
21
/
4/ 98
23
/
4/ 98
25
/
4/ 98
27
/
4/ 98
29
/9
8
5/
1/
9
5/ 8
3/
9
5/ 8
5/
9
5/ 8
7/
9
5/ 8
9/
5/ 98
11
/
5/ 98
13
/
5/ 98
15
/
5/ 98
17
/
5/ 98
19
/
5/ 98
21
/
5/ 98
23
/
5/ 98
25
/
5/ 98
27
/
5/ 98
29
/
5/ 98
31
/9
8
198.111.195.113 SAGINAW
198.111.195.5 SAGPOP
198.110.209.5 NMU
198.110.131.5 MTU
[email protected]
32
Frequency of Michnet OSPF Changes 4/98
2500
2000
1500
1000
500
Seconds
[email protected]
33
97
93
89
85
81
77
73
69
65
61
57
53
49
45
41
37
33
29
25
21
17
13
9
5
0
1
Number of LSA Changes with this
Frequency
3000
Michnet OSPF Oscillation on 3/13/98
250
Number of Changes
200
150
100
50
0
1
13 25
37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277
5 Minute Buckets
[email protected]
34
2。Surveyor
Dedicated PC running Unix at key sites
GPS for clock synchronization
One way delay & loss measurements
Also routing via traceroute
Community is Internet 2 clients,
[email protected]
35
Surveyor on Abilene
Continuous measurement
One-way delay and loss
1/sec on Poisson Schedule
12 Byte UDP packets
Traceroutes
72 Machines
– http://hartman.advanced.org/IPPMApplet/report/Report.html
-- Java, close to real-time
– http://ippm-db.advanced.org/plots/ -- static
[email protected]
36
[email protected]
37
End-to-End Performance Initiative
Goal: 7 Mbps bi-directional UDP
Duke
Frankfurt
Teleglobe
ATM
DFN
NCREN
Abilene
WASH
= Surveyor node
25
Broadway
Dante
NYCM
60
Hudson
[email protected]
38
3. PingER
SLAC & Stanford University
ESNET / physics community
High-energy physics focus
http://www-iepm.slac.stanford.edu/pinger/
Long-term ping measurements
Set of cooperating measurement
sites -- not just random hosts
- http://www.slac.stanford.edu/xorg/icfa/ntf/home.html
- http://www.slac.stanford.edu/xorg/icfa/ntf/tool.html
[email protected]
39
Ping
Treats Internet as black box
round trip response time, loss, reachability, jitter
Low cost/lightweight tool
- ping “universally available”, easy to understand
• no software for clients to install
• no special privileges needed for monitor sites
- resources: 100bps/link, ~600kBytes/month/link
Ping mature, well understood, widely available
[email protected]
40
Scale of Measurements
18 Monitoring sites - 7 in US (5 ESnet, 2 vBNS), 2
in Canada, 7 in Europe (ch, de, dk, hu, it, uk(2)), 2
in Asia (jp, tw)
1261 monitoring-remote-site pairs
379 unique hosts, 272 sites
PingER pair distribution by
50 beacon sites, 27 countries
global area
South
Russian
America
Fed
response, jitter, loss, reachability
1%
4%
Edu
Japan
Data goes back > 4 years
33%
3%
1 Million probes of Internet / day
Europe
38%
China
2%
Canada
5%
[email protected]
Com
2%
Gov
7%
Mil
Org 0%
Australasia Asia 1%
2%
1%
41
http://www.slac.stanford.edu/
/xorg/iepm/pinger/table.html
Web
Interface
Choose metric
loss, RTT, variability,
reachability
Choice of time ticks
hour, daily, monthly;
can also select day,
month etc.
Choice of group
Export data to Excel
Value colored by quality
Drill down to plots
[email protected]
42
Effect of STAR-TAP
on KEK.jp <=>SLAC
ITU G114 300msec RTT limit for voice
400
200
September 1 to December 31 1998
0
[email protected]
0
43
% packet loss
Ping RTT in msec.
50
Improvement in RTT
450
400
ITU G.114 300msec RTT limit for voice
350
300
250
1.2%/mo
1.7%/mo
1.4%/mo
0.22%/mo
0.72%/mo
1.3%/mo
ESnet (14 pairs)
Japan (6 pairs)
Europe (27 pairs)
cern.ch (1 pair)
Expon. (Canada (6 pairs))
Expon. (Edu/US
(50 pairs))
STAR-TAP
Expon. (ESnet (14 pairs))
Expon. (Japan (6 pairs))
Expon. (Europe (27 pairs))
200
150
100
50
44
Jan-99
Sep-98
May-98
Jan-98
Sep-97
[email protected]
May-97
Jan-97
Sep-96
May-96
Jan-96
Sep-95
May-95
0
Jan-95
Ping round trip delay in msec.
Response time to selected groups of sites
Canada (6 pairs)
from SLAC
Edu/US (50 pairs)
Bandwidth improvement from
ESnet sites
TCP bandwidth < (1470/RTT) * (1/sqrt(loss))
Bandwidth in kbytes/sec
10000
1000
Canada (18 pairs)
Edu/US (138 pairs)
ESnet (31 pairs)
Japan (12 pairs)
Europe (95 pairs)
100% improvement / year
Expon. (ESnet (31 pairs))
Expon. (Europe (95 pairs))
Expon. (Edu/US (138 pairs))
Expon. (Canada (18 pairs))
Expon. (Japan (12 pairs))
100
10
Jun-94
Oct-95
[email protected]
Mar-97
Jul-98
45
Dec-99
What about loss?
BCR Feb ‘98 & Jan ‘99 shows even with 10%
random loss can get almost toll quality
Our experience in other areas is to say problems
start between 2.5 and 5% packet loss
ITU/TIPHON defines a loss of < 3% as allowing
“good” Internet telephony
[email protected]
46
Improvement in packet loss
10
100
W Europe (81) 3.0%/mo
Canada (15) 6.8%/mo
US/Edu (104) 7.6%/mo
Japan (10) 0.4%/mo
ESnet (81) 8.7%/mo
0.1
1
2.5%
[email protected]
Jan-00
Oct-99
Jul-99
Apr-99
Jan-99
Oct-98
Jul-98
Apr-98
Jan-98
Oct-97
Jul-97
Apr-97
Jan-97
Oct-96
Jul-96
Apr-96
Jan-96
Oct-95
Jul-95
Apr-95
SLAC<=>vBNS ~ 2 * SLAC <=> ESnet
Jan-95
0.01
Percent round trip packet loss
Packet loss between ESnet & selected sites
47
400
200
Response time in msec.
Short term effect of routing change
3 sites sites seen from an XIWT
monitoring site for 3 days in
November 1998
November 21
0
[email protected]
November
1998
48
Median Packet Loss Seen From Asia.
30
Effect of direct connection
Median % Packet Loss During Month.
Asia(KEK) to WestEurope (15 pairs)
Asia(KEK) to EastEurope (5 pairs)
Asia(KEK) to NorthAmerica (27 Pairs Since May 98)
25
Asia(KEK) to SouthPacific (2 Pairs in January 99)
Expon. (Asia(KEK) to SouthPacific (2 Pairs in January 99))
20
Expon. (Asia(KEK) to EastEurope (5 pairs))
NACSIS Europe Line was
Expon. (Asia(KEK) to NorthAmerica (27 Pairs Since May
Directly connected to Ten-3498))
Expon. (Asia(KEK) to WestEurope (15 pairs))
at London on July 1st 1998.
15
10
5
0
[email protected]
May-98 Jun-98
Jul-98
Aug-98 Sep-98 Oct-98
49
Nov-98 Dec-98 Jan-99
Poor backup route
Response time msec.
%Loss
Normal link
Via DESY-MSU
satellite (6 hops)
Satellite link down, goes via
DFN-UUnet/Washington/NY/Stockholm Relcom - MSU (20 hops)
[email protected]
50
Monday March 8 - Wednesday March 10, 1999 (GMT)
7/1/96
[email protected]
10/1/98
51
1/1/99
30
7/1/98
12=>45Mbps
4/1/98
4=>12Mbps
1/1/98
10/1/97
7/1/97
4/1/97
35
1/1/97
10/1/96
40
4/1/96
2=>4Mbps
1/1/96
10/1/95
7/1/95
45
4/1/95
1/1/95
Median monthly ping packet loss
Common Congestion
Point
45=>90Mbps
45=>90Mbps
25
20
15
10
UK
seen
from
ESnet
0
5
Countries Expected to be Good
[email protected]
52
Calibration of ping
Sanity checks:
- host pings itself, host pings host at same site
- high statistics between a few sites & inside site:
• see www.slac.stanford.edu/comp/net/wan-mon/ping-histat.html
• look at subtle behaviors, e.g. RTT distribution tails
- check “wire time” (sniffer) vs. ping reported times, at
client & server
• see www.slac.stanford.edu/comp/net/wanmon/error.html
Correlate with Surveyor one-way measures
[email protected]
53
Natural enemies of ping
Poor choice of remote host (clustered, variable
load..) or monitoring host
Ping program problems and pathologies
- Some implementations have bugs, or are incomplete
- Spurious packets confuse ping programs (<0.2% effect)
• e.g. program sends 5 packets sees 10.
- Out of order packets (< 0.02% effect)
- Some sites/hosts block pings
- Other sites limit pings to a certain size
- Rate limiting, e.g. some sites filter out ICMP traffic
during high usage or all the time
[email protected]
54
Ping limiting/blocking
First noticed in 1996
- protect against ping o’death (OS) & smurf attacks
(directed broadcasts)
Host requirement to implement ping
- but not to execute, and probably blocked at firewall
First step for cracker scanning a site
Identified at 2% hosts (i.e. currently a small
effect)
http://www.slac.stanford.edu/comp/net/wanmon/pathology.html
[email protected]
55
Avoiding
careful choice of host => beacon sites
working with remote sites & ISPs
using TCP echo or UDP echo (security), but
crackers will find them and often already blocked
new protocol designed for measurement (IPMP)
special purpose measurement machines &
protocols
[email protected]
56
Monitoring Conclusions
Performance is improving
ESnet & vBNS/Internet 2 well configured provide
good service within & between their nets
Performance within A&R networks is generally
good
Minimize ISPs crossed, peering critical
Intercontinental performance is poor to bad
Today need headroom, or managed bandwidth,
QoS in future
End users need monitoring to know what to
expect, write SLAs, set baselines, ID problems,
plan
[email protected]
57
More Information & extra info
follows
WAN Monitoring at SLAC has lots of links
- http://www.slac.stanford.edu/comp/net/wan-mon.html
Tutorial on WAN Monitoring (including methods, RTT, jitter,
loss & QoS thresholds etc.)
- http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html
PingER History tables
- http://www.slac.stanford.edu//xorg/iepm/pinger/table.html
Internet Monitoring in the HEP Community, SLAC-PUB7961, presented at CHEP98, Chicago, Aug-98
- http://www.slac.stanford.edu/pubs/slacpubs/7000/slac-pub7961.html
[email protected]
58
4。NIMI
National Internet Measurement Infrastructure
LBL, PSC Collaboration (Paxson and Mathis)
Architecture for deploying probe machines
throughout the Internet
End-to-end and hop-by-hop measurement
[email protected]
59
Common Solutions Group
Deploy probe machines at major US universities
Managed by Advanced Network and Services
GPS receivers for clock sychronization
Software running on PC (x86) platforms
[email protected]
60
Common Solutions Group
[email protected]
61
5。NLANR Measurement and Network
Analysis
NLANR/MNA
(UCSD/SDSC)
http://moat.nlanr.net/
Funded by the National Science Foundation/CISE/ANIR
目标
建立一个测量分析基础设施
网络测量分析相关的研究
开发分析和可视化工具
[email protected]
63
2Mbps OCnn
data
(anywhere)
16 kbps
bandwidth
Network workload parameter space
Telecomcontinuous
voice “normal”
burst
deferred shared
prioritized guaranteed
service qualities
[email protected]
64
NAI system
Coral
monitor
(red)
Active
measurements
(green)
NLANR
network analysis
infrastructure
Coral
monitor
(red)
Coral
monitor
(red)
Active
measurements
(green)
vBNS SNMP
data
(green)
Routing data
source (BGP)
(green)
storage and computation
(red)
compute
engine
(varies)
compute
engine
(varies)
storage, computation, and
external presentations
(green)
compute
engine
(varies)
compute
engine
(varies)
Data archival storage backend
12GB DDS3 tape, WORM,[email protected]
CD-RW, DVD-RAM
(or green if encrypted)
External network
access (web, ftp,
email, ….)
65
Central machines
• nai.nlanr.net
• server for initial data collection
• 160 GB, 256MB memory, dual 450MHz PII
• moat.nlanr.net
• external web server
• 160 GB, 256MB memory, dual 450MHz PII
• four analysis computation engines
• each: 18GB, 256MB memory, 450MHz PII
• amp.nlanr.net and volt.nlanr.net
• each: 92GB, 128MB memory, 400MHz PII
[email protected]
66
File and
compute servers
nai.nlanr.net
moat.nlanr.net
Analysis
computing
engines
[email protected]
67
Passive Monitors
OCXmon
- 联合开发:MCI、NLANR、CAIDA
- 采集包头,有许多配套数据分析工具
• http://moat.nlanr.net/OC3mon-monitors/
- 支持的网络接口
• ATM (DS-3, OC-3, OC-12)
• POS (OC-3 and OC-12)
[email protected]
68
Coral/OCXmon
(passive traffic collection and analysis at optical carrier speeds)
• completely noninvasive, no impact on forwarding paths
• aggregated traffic signature at a measurement point
• detailed characteristics of individual transactions
• OC3(deploying) --> OC12(prototype) --> OC48(future)
• reference implementations
• Unix
• FreeBSD version in http://moat.nlanr.net/Coral
• DOS
• http://www.vbns.net/~apisdorf/coral
[email protected]
69
Coral components
optical interconnection
optical
splitters
OC3
connection
end-point
OC3
connection
end-point
host system bus
OC3mon
intelligent
subsystem
system
memory
[email protected]
host
collection
and
analysis
process
70
OC3mon
machine
[email protected]
71
Optical splitters
[email protected]
72
TAAD
流量分析和自动诊断
Traffic Analysis and Auto Diagnosis
http://www.ncne.nlanr.net/TCP
分析Ocxmon采集的数据
TCP性能诊断
- 拥塞方向:upstream or downstream
- 未优化的TCP implementations
- 可做优化的TCP Flows
[email protected]
73
Passive measurement deployment status
U. of Washington
STARTAP/APAN
NCAR
U. Colorado, Boulder
Argonne Nat. Lab
U. of Michigan
Michigan State U.
Ohio
State U.
NCSA
U. of Pennsylvania
FIX-West
Old Dominion U.
AIX/MAE-West
NASA-Ames
CSU, San Bernardino
UCLA
SDSC, U. California, San Diego
FDDImon
OC3mon
OC12mon
collaboration discussions
Vanderbilt U.
Rice U.
Baylor College of Medicine
U. of Houston
Texas A&M U.
MCNC
North Carolina State U.
U. of North Carolina
Duke U.
U. of Florida
Miami U.
Florida State U.
28 May 1999
[email protected]
74
Some analysis available
• http://moat.nlanr.net/OC3analysis - analysis of an aggregation point data (similar to what will
be available for the HPC aggregation points or gigaPoPs)
• http://moat.nlanr.net/PBHA - analysis of packets, bit volume, and host activity on a link.
• http://moat.nlanr.net/SF - analysis of TCP flags (useful for both Internet researchers and
vendors)
• http://moat.nlanr.net/DNS - analysis of traffic by protocol -- with respect to UDP, specifically
DNS traffic.
• http://moat.nlanr.net/PLRL - analysis of the behavior of sequences of packets or packet run
lengths is important to the design and development of next generation internetworking hardware
and software
• http://moat.nlanr.net/BGPAddr and http://moat.nlanr.net/ASPL - analyses of the
interconnectivity of Autonomous Systems
• http://moat.nlanr.net/IPaddrocc - analysis of the 32 bit (IP v4) Internet address space
[email protected]
75
Active measurement/analysis
Led by Tony McGregor ([email protected])
Attempt to deploy FreeBSD-based AMP machines at
all HPC sites
- almost 70 machines currently deployed and operational
Currently RTT, topology, and loss; user/event driven
throughput
Current work provides base for:
- validation of active measurement techniques
- comparison of HP service provider internal measurements
with site-to-site measurements
[email protected]
76
Active measurement environment
Dept/
user
Dept/
user
Dept/
user
Dept/
user
Site
(campus)
HPC
backbone
network
Site
(campus)
inner
perimeter
Dept/
user
Dept/
user
site perimeter
user/application perimeter
[email protected]
77
Active measurement
deployment status
UAlaska
UWashington Washington
State U.
Montana State
UOregon
UCB
Stanford
UCSC SLAC
UofUtah
UVermont
UWiscMilwaukee
UWisc
Michigan State
SDSMT
URochester Dartmouth
Iowa UIowa NWU
MIT NTNU
UWyoming
UMichigan
UMass
State
UIC FNAL
PSU
BU Harvard
Columbia Yale
ColoState
UIUC/NCSA
CMU/PSC
UDel UPenn
NCAR
UMBC
Princeton
IU
WVU UMd
UCBoulder
Kansas State
JHU
UC
NSF
UMissouri
GMU Georgetown
UKansas
UVirginia
ODU
Oklahoma State
WUSTL
NCREN/NCSC
NDSU
UofOklahoma
UNC-CH
Duke
UCLA
CSU-SB
NCSU
U New Mexico
UAH
GATech
UC-Irvine CSUPomona
Mississippi
SDSC UCSD
UArizona
Emory U.
SMU
State
UAB
SDSU
UA
Rice
FSU
UFlorida
UCF
USF
UWaikato
UMiami
27 August 1999
[email protected]
78
Current measurements
We currently assess:
-
round trip time (RTT)
packet losses
topology (traceroute)
throughput
Across all sites.
NLANR provides the measurement machine and
administration
the local site physically deploys the machine
[email protected]
79
AMP architecture
• typically full mesh across AMP machines
• some “destination-only” exceptions
Web
browser
Cichlid
vis
Analysis
machine
Active
monitor
Other
target
Analysis
machine
Active
monitor
Active
monitor
• central data repository and visualization machines
• data available via:
• web interface to results
• an NLANR developed 3D visualization tool (Cichlid)
• raw data
[email protected]
80
AMP web info (1)
[email protected]
81
AMP web info (2)
[email protected]
82
AMP web info (3)
[email protected]
83
AMP web info (4)
[email protected]
84
AMP web info (5)
[email protected]
85
AMP web info (6)
[email protected]
86
AMP Routes
[email protected]
87
3D visualization, bar graphs
• shows RTT as a
moving time series
• uses OpenGL tool
that allows viewing
from different
perspectives, zoom,
fly-through, etc.
(developed by Jeff
Brown)
[email protected]
88
3D visualization, terrain map
• shows RTT as a
rendered surface
• rugged (noise
reduced) terrain
implies high and
variable RTT; a
rough network
[email protected]
89
Real-time
performance
queries
(access controlled
to avoid misuse)
[email protected]
90
vBNS usage sample based on
SNMP data (VRML object)
[email protected]
91
Routing visualization example based on information from the globally
visible routing system
• BGP derived paths
• utilizing AS
(autonomous system)
numbers
• global reachability
• how to visualize
“the system”
[email protected]
92
Cichlid server/client model
Server
(non-local
data generator)
Server
(non-local
data generator)
Client/user
OpenGL based
visualization
engine
Server
(non-local
data generator)
[email protected]
93
cichlid visualization
[email protected]
94
IP use and plen matrices
[email protected]
95
Cichlid 2
[email protected]
96
Cichlid for Windows
[email protected]
97
CAIDA Projects
NGI
-
CoralReef - OC48 monitor
Internet tomography using skitter
DNS Root Server initiative
CoralReef - security enhancements
Database/Analysis
Visualization of Massive Data Sets
[email protected]
98
CAIDA Projects
BW-EST: Bandwidth Estimation (sponsored by DOE)
DNS Analysis: Analysis of the DNS root and gTLD
nameserver system
IEC: Internet Engineering Curriculum Repository
Internet Atlas Project
IPNmoo: Inter-Provider Network MOO
NCS: Routing Analysis and Peering Policy for Enhancing
Internet Performance and Security
NMS: Network Modeling and Simulation
SD-NAP: San Diego Network Access Point
Trends: Correlating Heterogeneous Measurement Data to
Achieve System-Level Analysis of Internet Traffic Trends
[email protected]
99
Dag 4.0——OC48 MON
[email protected]
100
[email protected]
Sample Visualization
from skitter Data
101
DNS performance plots
[email protected]
102
[email protected]
103
[email protected]
104
7。清华
[email protected]
106
[email protected]
108
[email protected]
109
[email protected]
110
B
1.5 / 3
R p
2
Test Two (no regulation)
100
Link Bandwidth (%)
90
80
70
60
50
40
30
Flow
Flow
Flow
Flow
Flow
Flow
1
2
3
4
5
6
Flow
Flow
Flow
Flow
Flow
Flow
1
2
3
4
5
6
20
Test Two (regulated)
10
0
Link Bandwidth (%)
100
0
20
90
40
60
80
100
120
Time (sec)
80
70
60
50
40
30
20
10
0
0
20
40
60
80
Time (sec)
[email protected]
100
120
T ~
1
p
111
在研项目
动态网络结构测量和分析技术
基于测量的网络行为关联分析技术
安全攻击行为的分类体系
多线索安全攻击行为的判定
NMI
[email protected]
112
NMI
Open Architecture
专用测量单元(Metrics Server)
测量网络(MetricsNet)
Standard Service Interface
A Demo Environment
[email protected]
113
4个关键问题
下一代大型计算机网络管理系统NMI的体系结构模
型
基于精简功能模型的、独立于互联设备的下一代
计算机网络管理单元设计
网管单元自动发现技术
IPv6网络的管理技术
[email protected]
114
部分工作介绍和演示
1)基于ARF的网络管理系统框架
2)安全监控单元
3)计费管理单元
4)基于Flow的网络测量单元
5)网络配置信息管理系统
6)流量监控单元
7)故障监控单元
8)拓扑发现单元
9)NMI关键技术研究
[email protected]
115
1)基于ARF的网络管理系统框架
功能模块动态加载/卸载
个性化定制
动态可伸缩式树型管理模型
分布式网管支持
[email protected]
116
[email protected]
117
[email protected]
118
2)安全监控单元
网络管理系统本身的安全
- 用户身份认证机制
- 分级访问机制
被管网络对象的安全
- 主机系统的安全漏洞检测
- 安全事件告警功能
[email protected]
119
技术特点
定时或者不定时的检测机制
指定缺陷范围的安全监控
结果的量化直观统计分析
多次结果对比支持
多种方式告警
网络管理系统的用户身份认证机制
基于用户组的用户访问授权机制
[email protected]
120
[email protected]
121
[email protected]
122
[email protected]
123
[email protected]
124
[email protected]
125
[email protected]
126
[email protected]
127
[email protected]
128
[email protected]
129
3)计费管理单元
10M、100M、1000M 、2.5G(ongoing)
动态ACL
统一的用户管理 :IP联网、EMAIL 收发、拨号
用户自服务
稳定可靠
https://usereg.tsinghua.edu.cn/
https://userman.tsinghua.edu.cn/
[email protected]
130
4)基于Flow的网络测量单元
专用流量采集协议栈
基于flow的流量归并
http://centaurus.serv.edu.cn:8080/id.php
[email protected]
131
Linuxflow packet-to-flow Daemon程序
AF_CAPPKT
SOCKET
LFEP UDP 数据发送
发送LFEP
UDP数据包
packet到
flow归并
用户空间
内核空间
AF_CAPTURE模块
recvmsg
cap_type
register
初始化
Cap_type模块
缓冲区
packet
handler
初始化
cap_add_pack
copy_flow
tasklet
softnet_data
Low_capture模块
netif_rx
网络设备驱动程序
[email protected]
图2 Linuxflow流量采集系统的基本结构
132
网络流量计费
二层镜像
二层镜像
图4
UDP
网络分析规划
Flow 收集存储
Linuxflow
服务器
服务器
网络监控
Flow 数据仓库
Linuxflow流量采集与分析环境 与数据挖掘
[email protected]
133
[email protected]
134
[email protected]
135
5)网络配置信息管理系统
支持设备、端口、链路、主机、网络等信息的管
理
可伸缩目录树模型
跨平台
基于SNMP的信息自动获取
根据接口自动生成基本网络信息
拓扑可视化
[email protected]
136
[email protected]
137
[email protected]
138
[email protected]
139
[email protected]
140
[email protected]
141
[email protected]
142
6)流量监控单元
实时监控
历史数据
集群监控和可视化
比特流量,分组流量和包长统计
辅助故障、性能、安全监控
[email protected]
143
[email protected]
144
[email protected]
145
[email protected]
146
[email protected]
147
[email protected]
148
[email protected]
149
7)故障监控单元
实时监控
自动报警
故障卡片
RTT/Loss信息
诊断工具
[email protected]
150
[email protected]
151
[email protected]
152
[email protected]
153
[email protected]
154
[email protected]
155
[email protected]
156
8)拓扑发现单元
并行提速
一致性检查
定点、屏蔽
[email protected]
157
[email protected]
158
[email protected]
159
[email protected]
160
[email protected]
161
[email protected]
162
8、Windmill Architecture
Three Components:
Protocol Multiplexing
Filter (PMF)
Abstract Protocol
Modules
Extensible Experiment
Engine
[email protected]
163
Windmill Architecture
Experiment can call any
module with packet to
extract:
Data payload
Errors and events
State information
Interlayer events can be easily
correlated
HTTP
TCP
IP
Experiment
[email protected]
164
9、Commercial End-User Measurement
Inverse (http://www.inverse.net)
- Dialup monitoring of access, and web servers
Keynote (http://www.keynote.com)
- Dialup monitoring of web servers
NetMedic (http://www.vitalsigns.com)
- End-user measurement of ISP performance
[email protected]
165
Commercial Network Statistics
NetMedic
Vital Signs Report
[email protected]
166
四、分析比较
现有网络性能监测系统对照表
系统
Surveyor
(美)
RIPE
(西欧)
PingER
(美)
AMP
(美)
Skitter
(美)
探测方法
OWDP
OWDP
ping
ping
traceroute
主机系统
专用
专用
可选
专用
专用
时钟同步
GPS
GPS
NTP
NTP
NTP
调度策略
Poisson
Poisson
bursty
线性随机
~30 min
包尺寸
40字节
100字节
100 &1000字
节
64字节
52字节
覆盖地区
US, CA, CH,
NL , NZ
EU, IL, US
32个国家
US, NZ, NO
Asia, CA,
UK, US
监测站数
51
32
18
70
20
主机对数
1000
1024
1200
4600
35000
启用时间
1997年
1998年
1995年
1999年
1998年
资助组织
CSG/Advanced
RIPE/欧洲R&E
DOE/ESnet/
HENP/XIWT
NSF/NLANR
/Internet 2
DARPA/
NSF/CAIDA
特性
Active Measurement Platforms
AMP 和 Surveyor 提供的信息不同;
彼此具有互补性;
Currently
- ~ 85 AMP sites, ~ 55 Surveyor sites
[email protected]
169
Comparing PingER & Surveyor
Method
Hosts
Freq
Timing
Sizes
Locations
Monitors
Remotes
Pairs
Storage
Data avail.
Start
Sponsors
Surveyor
PingER
1 way delay
2 way ping
dedicated
"selected"
~2*2/s (~2kbps) ~0.01/s (~0.1kbps)
Poisson <2/s> bursty (30 min)
~40B
100B & 1000B
US, Ca & Nz
10 (22) countries
~30
17
~30 (~full mesh) ~300 (hierarchical)
~900
~1200
~38MB/pair/mo 0.6MB/pair/mo
On request
Public web access
1997
1995
[email protected]
CSG/Advanced
Esnet/HENP
170
PingER - Surveyor
Complementarity
Agree well
Surveyor has one way measurements, PingER only roundtrip
Surveyor dedicated platforms & strong central
management
- experience with PingER shows this has benefits.
PingER more parsimonious/lightweight (bandwidth, disk
space, cpu)
- better for poor connectivity sites - e.g. Russia, China
- but necessarily less accurate especially at small (hourly) time
resolution on low loss links.
PingER good for looking at long term trends & grouping
where statistics are less a problem.
[email protected]
171
五、URLs
Important URLs
Research and Standards
http://www.ietf.org/html.charters/ippm-charter.html
http://www.caida.org
http://www.merit.edu/ipma
Commercial Vendors
http://www.inverse.net
http://www.keynote.com
http://www.vitalsigns.com
[email protected]
173