Passive Network Measurement

Download Report

Transcript Passive Network Measurement

Update on GGF
Measurement Activities
Bruce Lowekamp
The College of William and Mary
–1–
My Research
William and Mary Computer Science Department
14 Faculty
45 PhD students
60 Masters students
lots of good undergraduates
Research
Remos: SNMP-based topology and utilization for distributed apps
Wren: leveraging topology and passive measurements for scalable
grid network performation measurement
Optimistic grid computing: fine-grained apps on a grid
GGF Network Measurements Working Group
–2–
Passive Network Measurement
When an application is running, use passive
measurements. When not, use active probes.
Controlled by monitoring system, knows when
measurements needed.
Conversion between measurements important.
–3–
Importance of Topology in Grids
No one rule governs performance
Real users (and system owners)
make bad choices
Grid applications must optimize
performance for these
environments.
Can we exploit topology
knowledge for better
measurements?
application performance?
–4–
Outline
•
•
•
•
GGF’s perspective on network measurement
GMA: Grid Monitoring Architecture
DAMED: Top-N Events
NMWG: Network Characteristics Hierarchy
–5–
GGF Perspective
Users of measurements
Application designers
Runtime system designers
Many users, many environments
Grid applications must be flexible, portable
R
R
R
R
?
R
dispersed users
?
network
R
R
R
R
R
R
R
?
?
R
R
R
R
R
R
R
VO-A
VO-B
–6–
R
Information Portability
Information must be portable
Each AS/VO may pick its own measurement system
Parts of network aren’t measured
Different parameters
Goal: Application runs, unaware of environment
Information from multiple measurement systems
Should not have to support 10 different performance models
–7–
GGF Projects
GMA: Grid Monitoring Architecture
What components do we need?
DAMED: Discovery And Monitoring Event Description
What are the Top-N events we need to support RIGHT NOW?
NMWG: Network Measurements
What does “bandwidth” mean anyway?
Components of this global information service
Boils down to schemas and protocols
–8–
Existing Pieces
Many of these components already exist or are in progress:
 instrumentation tools
• Pablo (UIUC), NetLogger (LBNL), log4j (apache), web100, SNMP, etc.
 host and network sensors
• too many to list
 sensor management tools
• JAMM (LBNL)
 event publication service
• MDS (Globus), NWS (UCSB), R-GMA (RAL),CODE (NASA AMES), Remos(CMU)
 event archive service
• netarchd (LBNL), NWS (UCSB)
 event analysis and visualization tools
• lots, but most only work for specific types of events:
o NetLogger nlv (LBNL), Probe (Stazi), Autopilot (UIUC), etc.
BUT, all use different event formats and protocols!
 no interoperability
–9–
Event Publication
To handle potentially huge amounts of event data
requires an event publication and subscription
service that is:
flexible
highly scalable
provides near real-time access to monitoring data
The Global Grid Forum (GGF) (www.gridforum.org)
has defined the “Grid Monitoring Architecture”
(GMA), for this purpose.
Several GMA implementations have started to appear
A great deal of work remains to define standard
event schemas and event dictionaries for the GMA.
– 10 –
GMA Terminology and Architecture
(Performance) Event:
 Typed collection of data with a
specific structure
Consumer
event publication
information
Producer Interface:
 makes performance data (events)
available
Directory
Service
event
data
Consumer Interface:
 receives performance data (events)
Producer
event publication
information
Directory Service:
 supports information publication and
discovery
 must be distributed and/or replicated
consumer
events
Producer Interface
analysis, filtering, etc.
Consumer Interf ace
– 11 –
producer
producer
DAMED WG
Discovery And Monitoring Event Description
Working Group
Chairs
• Jennifer Schopf, ANL
• James Magowan, IBM
Top-N Metrics
– 12 –
DAMED Charter
Define a basic set of monitoring event descriptions
 information (attributes) associated with a particular data element
 conventions for the representation of the value associated with it.
Develop standard representations of the most widely used
measurement values (the "top N".)
Emergence of a set of conventions and recommendations that
will ease the task of defining richer, domain-specific schemas
Damed if we do
 Not everyone will be happy
Damed if we don’t
 Never reach our goal of seamless interoperability of grids (one big grid
e.g. internet)
– 13 –
DAMED Terminology
Events
Event Target
Event Type
Event Name = Target.Type
network.link
delay.TCP
network.link.delay.TCP
– 14 –
Target Types
Targets used in Top-N Events
Host: IP
Process: IP, PID
Disk Partition: /home
Network Link: IP {port},IP {port}
Software: String
Scheduler: IP, String
Not necessarily hierarchical
– 15 –
Event Types
Top-N
CPU Load
System uptime
Disk size
Disk used
TCP available bandwidth
Ping RTT
Traceroute number of hops
Running software status
Packet Loss
Available Memory
Host Architecture
Host OS
Physical Memory
– 16 –
program
new event name
BUFFER_SIZE attribute of bandwith.achievable
aggregation
type
(AGG_TYPE)
-
aggregation units
interval
(UNITS)
(AGG_IVAL)
-
Iperf
bandwidth.achievable.tcp.singleStream
mean
(sec)
bandwidth.achievable.tcp.multiStream
mean
(sec)
bandwidth.achievable.tcp.singleStream
mean
(sec)
bandwidth.achievable.tcp.multiStream
mean
(sec)
GridFTP
bandwidth.achievable.tcp.singleStream
mean
(sec)
Ping
bandwidth.achievable.tcp.multiStream
delay.roundTrip
mean
mean
(sec)
(sec)
ms
netest
delay.roundTrip
avg
(sec)
ms
bandwidth.achievable.udp.singleStream
max
(sec)
Mbits/s
bandwidth.achievable.udp.multiStream
max
(sec)
Mbits/s
tcp.recommendedBufferSize
N/A
N/A
bytes
bandwidth.achievable.tcp.singleStream
pipechar
Mbits/s
description
target parameters additional parameters
mean throughput for a run, with
1 stream
SRC, DST
mean throughput for a run, with
parallel streams
SRC, DST
Mbits/s
Mbits/s
bandwidth.achievable.tcp.recommendedStreams N/A
N/A
int
application.bandwidth.used.tcp
N/A
N/A
hop.bandwidth.capacity
N/A
N/A
Mbits/s
hop.bandwidth.utilized
N/A
N/A
Mbits/s
path.bandwidth.available.bottleneckHop
N/A
N/A
Mbits/s
– 17 –
mean throughput for one
reporting interval, with 1 stream
mean throughput for one
reporting interval, with parallel
streams
mean throughput for one run,
single-stream
mean throughput for one run,
parallel streams
Average of ping tests
SRC, DST
SRC, DST
SRC, DST
SRC, DST
SRC, DST
Round trip delay
SRC, DST
Maximum UDP bandwidth from
this host, with 1 stream
SRC, DST
Maximum UDP bandwidth from
this host, with parallel streams
recommended max buffer size
for TCP
Predicted achievable
throughput for a single stream
Is the bandwidth from TCP
improved by using parallel
streams?
TCP bandwidth used by the
netest
SRC, DST
REPORT=RUN,
BUFFER_SIZE
REPORT=RUN,
NUM_STREAMS,
BUFFER_SIZE
REPORT=INTERVAL,
BUFFER_SIZE
REPORT=INTERVAL,
NUM_STREAMS,
BUFFER_SIZE
BUFFER_SIZE
NUM_STREAMS,
BUFFER_SIZE
METHOD=ICMP
METHOD=(ICMP|UDP|
TCP)
NUM_STREAMS
SRC, DST
SRC, DST
MIN, MAX, AVG
SRC, DST
SRC, DST
SRC, DST,
Estimated capacity for a single HOP_SRC,
hop on a path.
HOP_DST
SRC, DST,
Estimated utilized capacity for HOP_SRC,
a single hop.
HOP_DST
Available bandwidth of the
bottleneck hop in the path
SRC, DST
HOP_NUM=<N>
HOP_NUM=<N>
HOP_NUM=<N>
NMWG
Network Measurements Working Group
Chairs:
Brian Tierney (LBNL)
Bruce Lowekamp (W&M)
Richard Hughes-Jones (Manchester)
Goal:
Portability of network measurements
Steps:
Define hierarchy of measurements
Establish mapping of tools<->measurements
Conversion between measurements of same type
– 18 –
Characteristics Hierarchy
Ultimate Goal: Portability of Measurements
Many APIs
Many tools
Natural Grid Development Process
More measurement systems
More measurement tools
More cooperation
More shared deployed infrastructure
Middleware must be able to determine what network
performance information is measuring.
How do we share measurement information without
discouraging development of new APIs and tools?
– 19 –
How the Nomenclature Helps
Need to classify measurements
What does it measure? Sometimes more important than how.
Not necessarily a new schema
Should be a good schema for network measurements
Not all systems are/should be organized this way
Can be used as annotation in any schema.
Goal is an agreed-upon classification of
measurements taken, to allow both current and
future measurement methodologies to classify their
observations to maximize their portability.
– 20 –
Representing a Measurement
A measurement is represented by two elements:
Characteristic
What is being measured. Bandwidth, latency, etc.
Network Entity
The part of the network described by the measurement
Link, path, host, etc.
describes
Characteristic
Network
Entity
measures
Measurement
Methodology
Singleton
is result of
Sample
– 21 –
Observation
Statistical
Terminology
Network Characteristics
Intrinsic properties of a portion of the network that are related to its
performance and reliability
Measurement Methodologies
Means and methods of measuring those characteristics
Observation
An instance of information obtained by applying a measurement
methodology.
Note on IETF IPPM RFC2330
Compatible where possible, but metrics means many things.
Guiding principle: clear meanings, follow standards
where defined.
– 22 –
Network Characteristics
“Intrinsic Property”
Property itself, not an observation
Unrelated to how measurement is made
Not a particular number
Packet Loss
Fraction of traffic
Loss patterns
Traffic profile
– 23 –
Measurement Methodology
Technique for recording or estimating a characteristic
Two approaches:
Raw: measuring actual characteristic
Derived: aggregate or estimate from other characteristics
Round trip delay
ping
TCP transmit/ACK pair
two one-way delay measurements
link propagation and queue length data
– 24 –
Observations
Singleton
Smallest possible observation
Sample
Several singletons together
Statistical
Derived from a sample by calculating a statistic
Timestamps, and ranges, are issues with each
observation
– 25 –
Network Entities
Network
Entity
endpoint
{ordered} 2
Network Path
Host-to-host
path
Hop
Node
Autonomous
System
Virtual
Node
Internal
Node
Router
Host
Switch
Proxy
Attributes must be included.
Nodes and paths can be physical or functional.
– 26 –
Describing Topology
Network
Entity
endpoint
{ordered} 2
Network Path
Host-to-host
path
Hop
Node
Autonomous
System
Virtual
Node
Internal
Node
Router
Host
Switch
Proxy
Two different types of topology
Physical: Actual links and nodes
Functional: Derived closeness
Attributes define the Path or Node
Multiple Topologies are Superimposed over physical network
– 27 –
Describing Topology
Network
Entity
endpoint
{ordered} 2
Network Path
Host-to-host
path
Hop
Node
Autonomous
System
Virtual
Node
Internal
Node
Router
Host
Switch
Proxy
Paths: Path data follows from source to destination
Unidirectional in most cases
Paths (including hops) may be made of components
Nodes: Hosts and Internal nodes
Physical and Functional graphs not disjoint at edges
– 28 –
Characteristics Overview
Characteristic
Bandwidth
Hoplist
Capacity
Utilized
Available
Achievable
Queue
Discipline
Capacity
Length
Delay
Forwarding
Round-trip
Forwarding
Policy
One-way
Forwarding
Table
Forwarding
Weight
Loss
Jitter
Loss Pattern
Availability
Round-trip
MTBF
Avail. Pattern
Others
Closeness
– 29 –
One-way
Relationship Between Measurements
Can we develop systems that use whatever
information is available?
iperf
pathload
QoS support
Need to be able to request measurement of
particular characteristic, without regard to what
sub-characteristic or tool is used to return the
result.
Convert loss pattern to loss rate.
Traffic profile to utilization fraction.
– 30 –
Characterization of Tools
Goal of hierarchy is to make measurements portable.
First step is to agree on what characteristic tools
measure.
Some tools measure multiple characteristics,
depending on parameters.
Many lists of tools, including E2EPI, our goal is to
annotate these lists and produce hierarchy with
multiple views.
– 31 –
NMWG Upcoming Work
Taxonomy is nice, but exchanging real data requires
a schema, with values for attributes and
parameters.
Two steps:
Map tools to taxonomy
Produce schema
Schema step is needed to reach goal of portability.
Participants including DAMED members.
– 32 –
Summary of GGF Activities
Focus on two aspects:
• System interoperability
• Measurement portability
• GMA completed
• DAMED finishing up Top-N documents
• NMWG characteristics hierarchy near release
• Need schema to put components together
Portions contributed by: Jennifer Schopf (ANL)
James Magowan (IBM), Brian Tierney (LBNL), and
Dan Gunter (LBNL)
– 33 –