第四章网络测量和监控

Download Report

Transcript 第四章网络测量和监控

《计算机网络管理》
主讲教师:王继龙
清华大学信息网络工程研究中心
[email protected]
第四章
网络测量和监控

第一节
网络测量技术综述

第二节
网络测量技术专题

第三节 网络测量系统举例
[email protected]
2
科技奥运信息网络蓝图
第三节
网络测量系统举例

Organizations

Measurement Classification

Measurement Projects Overview

比较
[email protected]
4
一、 Organizations

IETF
- IP Performance Metrics (IPPM)
- Realtime Traffic Flow Measurement (RTFM)

NLANR
- Measurement & Operations Analysis Team
- National Center for Network Engineering

CAIDA
[email protected]
5
IETF IPPM Working Group

Develop metrics for wide-area network
measurement

IPPM: Metrics
-

Framework: RFC2330
Connectivity
One-way delay and loss
Round-trip delay and loss
Flow capacity/Bulk transfer capacity
Delay variation (jitter)
http://www.advanced.org/IPPM/
[email protected]
6
RTFM


Measure network flows through passive
measurments
Standards for
- architecture; metrics for traffic flows
- language to describe flows
- MIB to control meter & gather measurements

Origin in accounting

http://www.auckland.ac.nz/net/Internet/rtfm
[email protected]
7
NLANR

NLANR(1995-1997):
- concerted effort across NSF funded supercomputing centers, with
focus on the vBNS and the HPC community

NLANR (1998):
- continued collaborative and expanded effort across three parties:
• Distributed Applications Support Team (DAST at UIUC/NCSA)
• Measurement and Operations Analysis Team (MOAT at
UCSD/SDSC)
• National Center for Network Engineering (NCNE at
CMU/PSC)


Web server:http://www.nlanr.net
Offspring focused on commercial market:CAIDA
(http://www.caida.org)
10/14/2002
[email protected]
8
CAIDA (www.caida.org)





Cooperative Association for Internet Data
Analysis
Based at Univ of California (SDSC)
NSF and Commercial Funding (Cisco, ANS)
Clearinghouse, coordinating body for network
statistics
Tool Development
- Network Visualization
- Multicast statistics
- Packet traces
[email protected]
9
二、Measurement Classification

Utilization [usually SNMP]

Active measurements
- ~Performance (one-way delay, loss, throughput)

Passive measurements
- ~Traffic characterization

Routing [usually BGP/traceroute]
[email protected]
10
测量项目

静态配置信息:SNMP

网络拓扑和路由

故障发现和定位

性能(效能):延迟、丢包、利用率

安全:弱点、热点、防御、追踪、取证、内容

使用:时间、带宽、字节、应用、服务

行为:时间规律、空间规律、内容规律、“品行
”规律
[email protected]
11
测量方法

主动式测量方法
- 利用网络对“探测包”的反映来测量 ping、Trace
- 通过网络对“探测包”的服务情况来实现测量treno测量
吞吐率
- 可能会产生“Heisenberg”效应,即由于“探测包”影响
了网络的性能,从而影响了测量结果的客观性

被动式测量方法 :不可感知
[email protected]
12
测量方法

Router-Based
- 端口流量(端到端流量)

Router-Aided
- 拓扑测量

Stand alone
- 时延(时钟同步)、拓扑(时延拓扑推测、丢包拓扑推测)、
性能推测(已知拓扑)、带宽测量(线路带宽、可用带宽、瓶
颈带宽)、网络距离、路由器参数推测(调度器类型和参数、
瓶颈节点缓存大小和策略)
[email protected]
13
Tools: Routing

Traceroute servers

IPMA tools (based on BGP peering)

Skitter: tomography (CAIDA)
[email protected]
14
Tools: Active performance

Treno

mping (windowed ping): invasive

pathchar: estimate router Qs
bottleneck bandwidth

traditional: ping, ftp, ttcp, netperf
[email protected]
15
Active Measurement efforts

AMP by NLANR MOAT
- Available to HPC/I2 sites
- See http://amp.nlanr.net/ for more information.
- Measures connectivity, loss and roundtrip time.
- Performance matrix between sites.

Surveyor by Advanced.org
- Available to HPC/I2 sites
- See http://www.advanced.org/csg-ippm/
- Measures one-way delay and packet loss.
- Performance matrix between sites.
- Requires GPS antenna.
[email protected]
16
Tools: Passive

NeTraMet: Nevil Brownlee’s RTFM

IBM Research’s RTFM

tcpdump

OCxMON: passive optical

Coral

~Cisco’s NetFlow, and

cflowd (CAIDA/ANS)
[email protected]
17
Tools: Utilization

Mainly commercial? Network management tools
(e.g., OpenView)

MRTG

NetScarf?
[email protected]
18
三、Measurement Projects
Overview

IPMA

Surveyor

pingER

NIMI

NLANR/MOAT NAI

CAIDA

CERNET

WINDMILL

Commercial
[email protected]
19
1。IPMA
University of Michigan and
Merit Network, Inc.
http://www.merit.edu/ipma
Major support from the National Science Foundation,
Intel, Hewlet Packard and the Merit RSNG Project
[email protected]
20
Missions

Probe machines at IXPs and inside backbones
- Focus routing
- Some latency/loss
- Visualization and data-mining

Real-time inter-domain problem diagnostics

Tool Development

http://www.merit.edu/ipma
[email protected]
21
Internet Measurement
Probe
Probe
Probe
Probe
Probe
Data
Dissemination
Data
Dissemination
AS2
Data
Dissemination
AS1
Public
[email protected]
22
Probes

Probe machines at major US Internet Exchange
Points
[email protected]
23
Data Dissemination
Probe
Machine
Data Dissemination
Server
Java
Client
Probe
Machine
Email
Report
Web
Pages
Database
[email protected]
Report
Generator
24
IPMA Tools

IPN -- provider outages and maintenance

ASExplorer -- routing instability and topology

NetNow -- network packet loss latency

NetGrapher -- graphs of various network
statistics

RouteTracker -- collects routing stability

TPD -- Monitors topology via traceroutes

SNMPTrackers -- monitors SNMP MIBs
[email protected]
25
IPMA: IPN
[email protected]
26
IPMA: FlapGraph
[email protected]
27
ASExplorer
[email protected]
28
BGP Routing Analysis

Most BGP traffic is pathological
- Up to 50 million BGP updates/day at Mae-East!
- Duplicate withdraws
- Duplicate Announcements

Vendor implementation problems

Strong correlation to network usage
[email protected]
29
AADS BGP Routing Updates (3/17/96 - 3/1798)
12,000,000
8,000,000
with
6,000,000
ann
4,000,000
2,000,000
3/17/98
1/17/98
11/17/97
9/17/97
7/17/97
5/17/97
3/17/97
1/17/97
11/17/96
9/17/96
7/17/96
5/17/96
0
3/17/96
Number of Updates
10,000,000
Date
[email protected]
30
OSPF Observations


Significantly more OSPF LSA changes than
anticipated (order of magnitude?)
Identified several hardware and software
pathologies
[email protected]
31
RSLA Interface Changes 4/98-6/98
198.110.145.41 LTUPOP
198.109.193.5 JACKSON
198.111.3.5 OAKLAND
500
198.108.91.5 MICHNET5
198.108.89.45 EMU
450
198.108.195.5 TCITY
198.109.134.33 STATEMICH
400
198.109.133.5 MSU
198.110.209.25 LSSU
350
198.108.131.5 WMU
198.110.9.5 FLINT
300
198.110.69.5 GRPOP
198.109.133.169 VOYAGER2
250
198.109.39.5 UMD
198.110.39.5 MUSKPOP
200
207.74.188.1 0
198.108.247.5 IRONMT
150
198.109.37.5 WSU1
198.111.129.5 CMU
100
198.109.225.5 BSPOP
198.110.145.49 TACOM
50
198.108.90.5 MICHNET1
198.110.18.5 FLPOP
0
4/
1/
9
4/ 8
3/
9
4/ 8
5/
9
4/ 8
7/
9
4/ 8
9/
4/ 98
11
/
4/ 98
13
/
4/ 98
15
/
4/ 98
17
/
4/ 98
19
/
4/ 98
21
/
4/ 98
23
/
4/ 98
25
/
4/ 98
27
/
4/ 98
29
/9
8
5/
1/
9
5/ 8
3/
9
5/ 8
5/
9
5/ 8
7/
9
5/ 8
9/
5/ 98
11
/
5/ 98
13
/
5/ 98
15
/
5/ 98
17
/
5/ 98
19
/
5/ 98
21
/
5/ 98
23
/
5/ 98
25
/
5/ 98
27
/
5/ 98
29
/
5/ 98
31
/9
8
198.111.195.113 SAGINAW
198.111.195.5 SAGPOP
198.110.209.5 NMU
198.110.131.5 MTU
[email protected]
32
Frequency of Michnet OSPF Changes 4/98
2500
2000
1500
1000
500
Seconds
[email protected]
33
97
93
89
85
81
77
73
69
65
61
57
53
49
45
41
37
33
29
25
21
17
13
9
5
0
1
Number of LSA Changes with this
Frequency
3000
Michnet OSPF Oscillation on 3/13/98
250
Number of Changes
200
150
100
50
0
1
13 25
37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277
5 Minute Buckets
[email protected]
34
2。Surveyor

Dedicated PC running Unix at key sites

GPS for clock synchronization

One way delay & loss measurements

Also routing via traceroute

Community is Internet 2 clients,
[email protected]
35
Surveyor on Abilene

Continuous measurement

One-way delay and loss

1/sec on Poisson Schedule

12 Byte UDP packets

Traceroutes

72 Machines
– http://hartman.advanced.org/IPPMApplet/report/Report.html
-- Java, close to real-time
– http://ippm-db.advanced.org/plots/ -- static
[email protected]
36
[email protected]
37
End-to-End Performance Initiative
Goal: 7 Mbps bi-directional UDP
Duke
Frankfurt
Teleglobe
ATM
DFN
NCREN
Abilene
WASH
= Surveyor node
25
Broadway
Dante
NYCM
60
Hudson
[email protected]
38
3. PingER

SLAC & Stanford University

ESNET / physics community

High-energy physics focus

http://www-iepm.slac.stanford.edu/pinger/

Long-term ping measurements

Set of cooperating measurement
sites -- not just random hosts
- http://www.slac.stanford.edu/xorg/icfa/ntf/home.html
- http://www.slac.stanford.edu/xorg/icfa/ntf/tool.html
[email protected]
39
Ping

Treats Internet as black box

round trip response time, loss, reachability, jitter

Low cost/lightweight tool
- ping “universally available”, easy to understand
• no software for clients to install
• no special privileges needed for monitor sites
- resources: 100bps/link, ~600kBytes/month/link

Ping mature, well understood, widely available
[email protected]
40
Scale of Measurements







18 Monitoring sites - 7 in US (5 ESnet, 2 vBNS), 2
in Canada, 7 in Europe (ch, de, dk, hu, it, uk(2)), 2
in Asia (jp, tw)
1261 monitoring-remote-site pairs
379 unique hosts, 272 sites
PingER pair distribution by
50 beacon sites, 27 countries
global area
South
Russian
America
Fed
response, jitter, loss, reachability
1%
4%
Edu
Japan
Data goes back > 4 years
33%
3%
1 Million probes of Internet / day
Europe
38%
China
2%
Canada
5%
[email protected]
Com
2%
Gov
7%
Mil
Org 0%
Australasia Asia 1%
2%
1%
41
http://www.slac.stanford.edu/
/xorg/iepm/pinger/table.html
Web
Interface
Choose metric
loss, RTT, variability,
reachability
Choice of time ticks
hour, daily, monthly;
can also select day,
month etc.
Choice of group
Export data to Excel
Value colored by quality
Drill down to plots
[email protected]
42
Effect of STAR-TAP
on KEK.jp <=>SLAC
ITU G114 300msec RTT limit for voice
400
200
September 1 to December 31 1998
0
[email protected]
0
43
% packet loss
Ping RTT in msec.
50
Improvement in RTT
450
400
ITU G.114 300msec RTT limit for voice
350
300
250
1.2%/mo
1.7%/mo
1.4%/mo
0.22%/mo
0.72%/mo
1.3%/mo
ESnet (14 pairs)
Japan (6 pairs)
Europe (27 pairs)
cern.ch (1 pair)
Expon. (Canada (6 pairs))
Expon. (Edu/US
(50 pairs))
STAR-TAP
Expon. (ESnet (14 pairs))
Expon. (Japan (6 pairs))
Expon. (Europe (27 pairs))
200
150
100
50
44
Jan-99
Sep-98
May-98
Jan-98
Sep-97
[email protected]
May-97
Jan-97
Sep-96
May-96
Jan-96
Sep-95
May-95
0
Jan-95
Ping round trip delay in msec.
Response time to selected groups of sites
Canada (6 pairs)
from SLAC
Edu/US (50 pairs)
Bandwidth improvement from
ESnet sites
TCP bandwidth < (1470/RTT) * (1/sqrt(loss))
Bandwidth in kbytes/sec
10000
1000
Canada (18 pairs)
Edu/US (138 pairs)
ESnet (31 pairs)
Japan (12 pairs)
Europe (95 pairs)
100% improvement / year
Expon. (ESnet (31 pairs))
Expon. (Europe (95 pairs))
Expon. (Edu/US (138 pairs))
Expon. (Canada (18 pairs))
Expon. (Japan (12 pairs))
100
10
Jun-94
Oct-95
[email protected]
Mar-97
Jul-98
45
Dec-99
What about loss?

BCR Feb ‘98 & Jan ‘99 shows even with 10%
random loss can get almost toll quality

Our experience in other areas is to say problems
start between 2.5 and 5% packet loss

ITU/TIPHON defines a loss of < 3% as allowing
“good” Internet telephony
[email protected]
46
Improvement in packet loss
10
100
W Europe (81) 3.0%/mo
Canada (15) 6.8%/mo
US/Edu (104) 7.6%/mo
Japan (10) 0.4%/mo
ESnet (81) 8.7%/mo
0.1
1
2.5%
[email protected]
Jan-00
Oct-99
Jul-99
Apr-99
Jan-99
Oct-98
Jul-98
Apr-98
Jan-98
Oct-97
Jul-97
Apr-97
Jan-97
Oct-96
Jul-96
Apr-96
Jan-96
Oct-95
Jul-95
Apr-95
SLAC<=>vBNS ~ 2 * SLAC <=> ESnet
Jan-95
0.01
Percent round trip packet loss
Packet loss between ESnet & selected sites
47
400
200
Response time in msec.
Short term effect of routing change
3 sites sites seen from an XIWT
monitoring site for 3 days in
November 1998
November 21
0
[email protected]
November
1998
48
Median Packet Loss Seen From Asia.
30
Effect of direct connection
Median % Packet Loss During Month.
Asia(KEK) to WestEurope (15 pairs)
Asia(KEK) to EastEurope (5 pairs)
Asia(KEK) to NorthAmerica (27 Pairs Since May 98)
25
Asia(KEK) to SouthPacific (2 Pairs in January 99)
Expon. (Asia(KEK) to SouthPacific (2 Pairs in January 99))
20
Expon. (Asia(KEK) to EastEurope (5 pairs))
NACSIS Europe Line was
Expon. (Asia(KEK) to NorthAmerica (27 Pairs Since May
Directly connected to Ten-3498))
Expon. (Asia(KEK) to WestEurope (15 pairs))
at London on July 1st 1998.
15
10
5
0
[email protected]
May-98 Jun-98
Jul-98
Aug-98 Sep-98 Oct-98
49
Nov-98 Dec-98 Jan-99
Poor backup route
Response time msec.
%Loss
Normal link
Via DESY-MSU
satellite (6 hops)
Satellite link down, goes via
DFN-UUnet/Washington/NY/Stockholm Relcom - MSU (20 hops)
[email protected]
50
Monday March 8 - Wednesday March 10, 1999 (GMT)
7/1/96
[email protected]
10/1/98
51
1/1/99
30
7/1/98
12=>45Mbps
4/1/98
4=>12Mbps
1/1/98
10/1/97
7/1/97
4/1/97
35
1/1/97
10/1/96
40
4/1/96
2=>4Mbps
1/1/96
10/1/95
7/1/95
45
4/1/95
1/1/95
Median monthly ping packet loss
Common Congestion
Point
45=>90Mbps
45=>90Mbps
25
20
15
10
UK
seen
from
ESnet
0
5
Countries Expected to be Good
[email protected]
52
Calibration of ping

Sanity checks:
- host pings itself, host pings host at same site
- high statistics between a few sites & inside site:
• see www.slac.stanford.edu/comp/net/wan-mon/ping-histat.html
• look at subtle behaviors, e.g. RTT distribution tails
- check “wire time” (sniffer) vs. ping reported times, at
client & server
• see www.slac.stanford.edu/comp/net/wanmon/error.html

Correlate with Surveyor one-way measures
[email protected]
53
Natural enemies of ping


Poor choice of remote host (clustered, variable
load..) or monitoring host
Ping program problems and pathologies
- Some implementations have bugs, or are incomplete
- Spurious packets confuse ping programs (<0.2% effect)
• e.g. program sends 5 packets sees 10.
- Out of order packets (< 0.02% effect)
- Some sites/hosts block pings
- Other sites limit pings to a certain size
- Rate limiting, e.g. some sites filter out ICMP traffic
during high usage or all the time
[email protected]
54
Ping limiting/blocking

First noticed in 1996
- protect against ping o’death (OS) & smurf attacks
(directed broadcasts)

Host requirement to implement ping
- but not to execute, and probably blocked at firewall



First step for cracker scanning a site
Identified at 2% hosts (i.e. currently a small
effect)
http://www.slac.stanford.edu/comp/net/wanmon/pathology.html
[email protected]
55
Avoiding





careful choice of host => beacon sites
working with remote sites & ISPs
using TCP echo or UDP echo (security), but
crackers will find them and often already blocked
new protocol designed for measurement (IPMP)
special purpose measurement machines &
protocols
[email protected]
56
Monitoring Conclusions







Performance is improving
ESnet & vBNS/Internet 2 well configured provide
good service within & between their nets
Performance within A&R networks is generally
good
Minimize ISPs crossed, peering critical
Intercontinental performance is poor to bad
Today need headroom, or managed bandwidth,
QoS in future
End users need monitoring to know what to
expect, write SLAs, set baselines, ID problems,
plan
[email protected]
57
More Information & extra info
follows

WAN Monitoring at SLAC has lots of links
- http://www.slac.stanford.edu/comp/net/wan-mon.html

Tutorial on WAN Monitoring (including methods, RTT, jitter,
loss & QoS thresholds etc.)
- http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html

PingER History tables
- http://www.slac.stanford.edu//xorg/iepm/pinger/table.html

Internet Monitoring in the HEP Community, SLAC-PUB7961, presented at CHEP98, Chicago, Aug-98
- http://www.slac.stanford.edu/pubs/slacpubs/7000/slac-pub7961.html
[email protected]
58
4。NIMI

National Internet Measurement Infrastructure

LBL, PSC Collaboration (Paxson and Mathis)

Architecture for deploying probe machines
throughout the Internet

End-to-end and hop-by-hop measurement
[email protected]
59
Common Solutions Group

Deploy probe machines at major US universities

Managed by Advanced Network and Services

GPS receivers for clock sychronization

Software running on PC (x86) platforms
[email protected]
60
Common Solutions Group
[email protected]
61
5。NLANR Measurement and Network
Analysis
NLANR/MNA
(UCSD/SDSC)
http://moat.nlanr.net/
Funded by the National Science Foundation/CISE/ANIR
目标

建立一个测量分析基础设施

网络测量分析相关的研究

开发分析和可视化工具
[email protected]
63
2Mbps OCnn
data
(anywhere)
16 kbps
bandwidth
Network workload parameter space
Telecomcontinuous
voice “normal”
burst
deferred shared
prioritized guaranteed
service qualities
[email protected]
64
NAI system
Coral
monitor
(red)
Active
measurements
(green)
NLANR
network analysis
infrastructure
Coral
monitor
(red)
Coral
monitor
(red)
Active
measurements
(green)
vBNS SNMP
data
(green)
Routing data
source (BGP)
(green)
storage and computation
(red)
compute
engine
(varies)
compute
engine
(varies)
storage, computation, and
external presentations
(green)
compute
engine
(varies)
compute
engine
(varies)
Data archival storage backend
12GB DDS3 tape, WORM,[email protected]
CD-RW, DVD-RAM
(or green if encrypted)
External network
access (web, ftp,
email, ….)
65
Central machines
• nai.nlanr.net
• server for initial data collection
• 160 GB, 256MB memory, dual 450MHz PII
• moat.nlanr.net
• external web server
• 160 GB, 256MB memory, dual 450MHz PII
• four analysis computation engines
• each: 18GB, 256MB memory, 450MHz PII
• amp.nlanr.net and volt.nlanr.net
• each: 92GB, 128MB memory, 400MHz PII
[email protected]
66
File and
compute servers
nai.nlanr.net
moat.nlanr.net
Analysis
computing
engines
[email protected]
67
Passive Monitors

OCXmon
- 联合开发:MCI、NLANR、CAIDA
- 采集包头,有许多配套数据分析工具
• http://moat.nlanr.net/OC3mon-monitors/
- 支持的网络接口
• ATM (DS-3, OC-3, OC-12)
• POS (OC-3 and OC-12)
[email protected]
68
Coral/OCXmon
(passive traffic collection and analysis at optical carrier speeds)
• completely noninvasive, no impact on forwarding paths
• aggregated traffic signature at a measurement point
• detailed characteristics of individual transactions
• OC3(deploying) --> OC12(prototype) --> OC48(future)
• reference implementations
• Unix
• FreeBSD version in http://moat.nlanr.net/Coral
• DOS
• http://www.vbns.net/~apisdorf/coral
[email protected]
69
Coral components
optical interconnection
optical
splitters
OC3
connection
end-point
OC3
connection
end-point
host system bus
OC3mon
intelligent
subsystem
system
memory
[email protected]
host
collection
and
analysis
process
70
OC3mon
machine
[email protected]
71
Optical splitters
[email protected]
72
TAAD

流量分析和自动诊断
Traffic Analysis and Auto Diagnosis

http://www.ncne.nlanr.net/TCP

分析Ocxmon采集的数据
TCP性能诊断
- 拥塞方向:upstream or downstream


- 未优化的TCP implementations
- 可做优化的TCP Flows
[email protected]
73
Passive measurement deployment status
U. of Washington
STARTAP/APAN
NCAR
U. Colorado, Boulder
Argonne Nat. Lab
U. of Michigan
Michigan State U.
Ohio
State U.
NCSA
U. of Pennsylvania
FIX-West
Old Dominion U.
AIX/MAE-West
NASA-Ames
CSU, San Bernardino
UCLA
SDSC, U. California, San Diego
FDDImon
OC3mon
OC12mon
collaboration discussions
Vanderbilt U.
Rice U.
Baylor College of Medicine
U. of Houston
Texas A&M U.
MCNC
North Carolina State U.
U. of North Carolina
Duke U.
U. of Florida
Miami U.
Florida State U.
28 May 1999
[email protected]
74
Some analysis available
• http://moat.nlanr.net/OC3analysis - analysis of an aggregation point data (similar to what will
be available for the HPC aggregation points or gigaPoPs)
• http://moat.nlanr.net/PBHA - analysis of packets, bit volume, and host activity on a link.
• http://moat.nlanr.net/SF - analysis of TCP flags (useful for both Internet researchers and
vendors)
• http://moat.nlanr.net/DNS - analysis of traffic by protocol -- with respect to UDP, specifically
DNS traffic.
• http://moat.nlanr.net/PLRL - analysis of the behavior of sequences of packets or packet run
lengths is important to the design and development of next generation internetworking hardware
and software
• http://moat.nlanr.net/BGPAddr and http://moat.nlanr.net/ASPL - analyses of the
interconnectivity of Autonomous Systems
• http://moat.nlanr.net/IPaddrocc - analysis of the 32 bit (IP v4) Internet address space
[email protected]
75
Active measurement/analysis

Led by Tony McGregor ([email protected])

Attempt to deploy FreeBSD-based AMP machines at
all HPC sites
- almost 70 machines currently deployed and operational

Currently RTT, topology, and loss; user/event driven
throughput

Current work provides base for:
- validation of active measurement techniques
- comparison of HP service provider internal measurements
with site-to-site measurements
[email protected]
76
Active measurement environment
Dept/
user
Dept/
user
Dept/
user
Dept/
user
Site
(campus)
HPC
backbone
network
Site
(campus)
inner
perimeter
Dept/
user
Dept/
user
site perimeter
user/application perimeter
[email protected]
77
Active measurement
deployment status
UAlaska
UWashington Washington
State U.
Montana State
UOregon
UCB
Stanford
UCSC SLAC
UofUtah
UVermont
UWiscMilwaukee
UWisc
Michigan State
SDSMT
URochester Dartmouth
Iowa UIowa NWU
MIT NTNU
UWyoming
UMichigan
UMass
State
UIC FNAL
PSU
BU Harvard
Columbia Yale
ColoState
UIUC/NCSA
CMU/PSC
UDel UPenn
NCAR
UMBC
Princeton
IU
WVU UMd
UCBoulder
Kansas State
JHU
UC
NSF
UMissouri
GMU Georgetown
UKansas
UVirginia
ODU
Oklahoma State
WUSTL
NCREN/NCSC
NDSU
UofOklahoma
UNC-CH
Duke
UCLA
CSU-SB
NCSU
U New Mexico
UAH
GATech
UC-Irvine CSUPomona
Mississippi
SDSC UCSD
UArizona
Emory U.
SMU
State
UAB
SDSU
UA
Rice
FSU
UFlorida
UCF
USF
UWaikato
UMiami
27 August 1999
[email protected]
78
Current measurements

We currently assess:
-



round trip time (RTT)
packet losses
topology (traceroute)
throughput
Across all sites.
NLANR provides the measurement machine and
administration
the local site physically deploys the machine
[email protected]
79
AMP architecture
• typically full mesh across AMP machines
• some “destination-only” exceptions
Web
browser
Cichlid
vis
Analysis
machine
Active
monitor
Other
target
Analysis
machine
Active
monitor
Active
monitor
• central data repository and visualization machines
• data available via:
• web interface to results
• an NLANR developed 3D visualization tool (Cichlid)
• raw data
[email protected]
80
AMP web info (1)
[email protected]
81
AMP web info (2)
[email protected]
82
AMP web info (3)
[email protected]
83
AMP web info (4)
[email protected]
84
AMP web info (5)
[email protected]
85
AMP web info (6)
[email protected]
86
AMP Routes
[email protected]
87
3D visualization, bar graphs
• shows RTT as a
moving time series
• uses OpenGL tool
that allows viewing
from different
perspectives, zoom,
fly-through, etc.
(developed by Jeff
Brown)
[email protected]
88
3D visualization, terrain map
• shows RTT as a
rendered surface
• rugged (noise
reduced) terrain
implies high and
variable RTT; a
rough network
[email protected]
89
Real-time
performance
queries
(access controlled
to avoid misuse)
[email protected]
90
vBNS usage sample based on
SNMP data (VRML object)
[email protected]
91
Routing visualization example based on information from the globally
visible routing system
• BGP derived paths
• utilizing AS
(autonomous system)
numbers
• global reachability
• how to visualize
“the system”
[email protected]
92
Cichlid server/client model
Server
(non-local
data generator)
Server
(non-local
data generator)
Client/user
OpenGL based
visualization
engine
Server
(non-local
data generator)
[email protected]
93
cichlid visualization
[email protected]
94
IP use and plen matrices
[email protected]
95
Cichlid 2
[email protected]
96
Cichlid for Windows
[email protected]
97
CAIDA Projects

NGI
-
CoralReef - OC48 monitor
Internet tomography using skitter
DNS Root Server initiative
CoralReef - security enhancements
Database/Analysis
Visualization of Massive Data Sets
[email protected]
98
CAIDA Projects









BW-EST: Bandwidth Estimation (sponsored by DOE)
DNS Analysis: Analysis of the DNS root and gTLD
nameserver system
IEC: Internet Engineering Curriculum Repository
Internet Atlas Project
IPNmoo: Inter-Provider Network MOO
NCS: Routing Analysis and Peering Policy for Enhancing
Internet Performance and Security
NMS: Network Modeling and Simulation
SD-NAP: San Diego Network Access Point
Trends: Correlating Heterogeneous Measurement Data to
Achieve System-Level Analysis of Internet Traffic Trends
[email protected]
99
Dag 4.0——OC48 MON
[email protected]
100
[email protected]
Sample Visualization
from skitter Data
101
DNS performance plots
[email protected]
102
[email protected]
103
[email protected]
104
7。清华
[email protected]
106
[email protected]
108
[email protected]
109
[email protected]
110
 B 

1.5 / 3 
R p


2
Test Two (no regulation)
100
Link Bandwidth (%)
90
80
70
60
50
40
30
Flow
Flow
Flow
Flow
Flow
Flow
1
2
3
4
5
6
Flow
Flow
Flow
Flow
Flow
Flow
1
2
3
4
5
6
20
Test Two (regulated)
10
0
Link Bandwidth (%)
100
0
20
90
40
60
80
100
120
Time (sec)
80
70
60
50
40
30
20
10
0
0
20
40
60
80
Time (sec)
[email protected]
100
120
T ~
1
p
111
在研项目





动态网络结构测量和分析技术
基于测量的网络行为关联分析技术
安全攻击行为的分类体系
多线索安全攻击行为的判定
NMI
[email protected]
112
NMI

Open Architecture

专用测量单元(Metrics Server)

测量网络(MetricsNet)

Standard Service Interface

A Demo Environment
[email protected]
113
4个关键问题

下一代大型计算机网络管理系统NMI的体系结构模
型

基于精简功能模型的、独立于互联设备的下一代
计算机网络管理单元设计

网管单元自动发现技术

IPv6网络的管理技术
[email protected]
114
部分工作介绍和演示
1)基于ARF的网络管理系统框架
2)安全监控单元
3)计费管理单元
4)基于Flow的网络测量单元
5)网络配置信息管理系统
6)流量监控单元
7)故障监控单元
8)拓扑发现单元
9)NMI关键技术研究
[email protected]
115
1)基于ARF的网络管理系统框架




功能模块动态加载/卸载
个性化定制
动态可伸缩式树型管理模型
分布式网管支持
[email protected]
116
[email protected]
117
[email protected]
118
2)安全监控单元

网络管理系统本身的安全
- 用户身份认证机制
- 分级访问机制

被管网络对象的安全
- 主机系统的安全漏洞检测
- 安全事件告警功能
[email protected]
119
技术特点







定时或者不定时的检测机制
指定缺陷范围的安全监控
结果的量化直观统计分析
多次结果对比支持
多种方式告警
网络管理系统的用户身份认证机制
基于用户组的用户访问授权机制
[email protected]
120
[email protected]
121
[email protected]
122
[email protected]
123
[email protected]
124
[email protected]
125
[email protected]
126
[email protected]
127
[email protected]
128
[email protected]
129
3)计费管理单元







10M、100M、1000M 、2.5G(ongoing)
动态ACL
统一的用户管理 :IP联网、EMAIL 收发、拨号
用户自服务
稳定可靠
https://usereg.tsinghua.edu.cn/
https://userman.tsinghua.edu.cn/
[email protected]
130
4)基于Flow的网络测量单元



专用流量采集协议栈
基于flow的流量归并
http://centaurus.serv.edu.cn:8080/id.php
[email protected]
131
Linuxflow packet-to-flow Daemon程序
AF_CAPPKT
SOCKET
LFEP UDP 数据发送
发送LFEP
UDP数据包
packet到
flow归并
用户空间
内核空间
AF_CAPTURE模块
recvmsg
cap_type
register
初始化
Cap_type模块
缓冲区
packet
handler
初始化
cap_add_pack
copy_flow
tasklet
softnet_data
Low_capture模块
netif_rx
网络设备驱动程序
[email protected]
图2 Linuxflow流量采集系统的基本结构
132
网络流量计费
二层镜像
二层镜像
图4
UDP
网络分析规划
Flow 收集存储
Linuxflow
服务器
服务器
网络监控
Flow 数据仓库
Linuxflow流量采集与分析环境 与数据挖掘
[email protected]
133
[email protected]
134
[email protected]
135
5)网络配置信息管理系统






支持设备、端口、链路、主机、网络等信息的管
理
可伸缩目录树模型
跨平台
基于SNMP的信息自动获取
根据接口自动生成基本网络信息
拓扑可视化
[email protected]
136
[email protected]
137
[email protected]
138
[email protected]
139
[email protected]
140
[email protected]
141
[email protected]
142
6)流量监控单元





实时监控
历史数据
集群监控和可视化
比特流量,分组流量和包长统计
辅助故障、性能、安全监控
[email protected]
143
[email protected]
144
[email protected]
145
[email protected]
146
[email protected]
147
[email protected]
148
[email protected]
149
7)故障监控单元





实时监控
自动报警
故障卡片
RTT/Loss信息
诊断工具
[email protected]
150
[email protected]
151
[email protected]
152
[email protected]
153
[email protected]
154
[email protected]
155
[email protected]
156
8)拓扑发现单元



并行提速
一致性检查
定点、屏蔽
[email protected]
157
[email protected]
158
[email protected]
159
[email protected]
160
[email protected]
161
[email protected]
162
8、Windmill Architecture
Three Components:
 Protocol Multiplexing
Filter (PMF)
 Abstract Protocol
Modules
 Extensible Experiment
Engine
[email protected]
163
Windmill Architecture
Experiment can call any
module with packet to
extract:
 Data payload
 Errors and events
 State information
Interlayer events can be easily
correlated
HTTP
TCP
IP
Experiment
[email protected]
164
9、Commercial End-User Measurement

Inverse (http://www.inverse.net)
- Dialup monitoring of access, and web servers

Keynote (http://www.keynote.com)
- Dialup monitoring of web servers

NetMedic (http://www.vitalsigns.com)
- End-user measurement of ISP performance
[email protected]
165
Commercial Network Statistics
NetMedic
Vital Signs Report
[email protected]
166
四、分析比较
现有网络性能监测系统对照表
系统
Surveyor
(美)
RIPE
(西欧)
PingER
(美)
AMP
(美)
Skitter
(美)
探测方法
OWDP
OWDP
ping
ping
traceroute
主机系统
专用
专用
可选
专用
专用
时钟同步
GPS
GPS
NTP
NTP
NTP
调度策略
Poisson
Poisson
bursty
线性随机
~30 min
包尺寸
40字节
100字节
100 &1000字
节
64字节
52字节
覆盖地区
US, CA, CH,
NL , NZ
EU, IL, US
32个国家
US, NZ, NO
Asia, CA,
UK, US
监测站数
51
32
18
70
20
主机对数
1000
1024
1200
4600
35000
启用时间
1997年
1998年
1995年
1999年
1998年
资助组织
CSG/Advanced
RIPE/欧洲R&E
DOE/ESnet/
HENP/XIWT
NSF/NLANR
/Internet 2
DARPA/
NSF/CAIDA
特性
Active Measurement Platforms

AMP 和 Surveyor 提供的信息不同;

彼此具有互补性;

Currently
- ~ 85 AMP sites, ~ 55 Surveyor sites
[email protected]
169
Comparing PingER & Surveyor
Method
Hosts
Freq
Timing
Sizes
Locations
Monitors
Remotes
Pairs
Storage
Data avail.
Start
Sponsors
Surveyor
PingER
1 way delay
2 way ping
dedicated
"selected"
~2*2/s (~2kbps) ~0.01/s (~0.1kbps)
Poisson <2/s> bursty (30 min)
~40B
100B & 1000B
US, Ca & Nz
10 (22) countries
~30
17
~30 (~full mesh) ~300 (hierarchical)
~900
~1200
~38MB/pair/mo 0.6MB/pair/mo
On request
Public web access
1997
1995
[email protected]
CSG/Advanced
Esnet/HENP
170
PingER - Surveyor
Complementarity



Agree well
Surveyor has one way measurements, PingER only roundtrip
Surveyor dedicated platforms & strong central
management
- experience with PingER shows this has benefits.

PingER more parsimonious/lightweight (bandwidth, disk
space, cpu)
- better for poor connectivity sites - e.g. Russia, China
- but necessarily less accurate especially at small (hourly) time
resolution on low loss links.

PingER good for looking at long term trends & grouping
where statistics are less a problem.
[email protected]
171
五、URLs
Important URLs
Research and Standards
http://www.ietf.org/html.charters/ippm-charter.html
http://www.caida.org
http://www.merit.edu/ipma
Commercial Vendors
http://www.inverse.net
http://www.keynote.com
http://www.vitalsigns.com
[email protected]
173