Transcript Yee-Ting Li

Networking for the Grid
Yee-Ting Li
eScience Summer School @
Edinburgh
What the GRID is
• Worldwide Distributed System
• Interconnected with ‘networks’
• Balancing processors, storage and network
utilization
• Networking is important to make GRID
work
Networking Important!
• Only way two grid nodes can
communicate with each other
• Need ways of determining how ‘efficiently’
they talk
• Focus on:
– The characterising how they talk
– The language they use to talk
Part 1
• Networking
• Networking Monitoring
– Networks are also transient
– Network performance also varies as you’re sharing
with n million other users
• Sometimes you can notice periodic patterns –
sometimes you can’t
– Difficult to analyse and create trends/predictions
– Show steps towards…
Networking 101
Cloud
Workstation
Workstation
• Networking straight forward
• Just connect to the network and it works!
• HA!
Networking
Router
Cloud
Switch
Workstation
Switch
Router
Workstation
Router
• Complex? Get’s more complex!
• Each node has it’s own scheduling priorities
• Routers must serve trillions of data units per
second!
Networking
• Complex stack from
Application
User space
•
Kernel space
Driver
NIC
•
which data has to flow to
get onto network
Each node on the
network also has their
own stacks
Routers have IPR on
stacks – no one knows
what Cisco stuff looks
like!
Example Metrics
• Connectivity
• Delay
– One-way delay
– Two-way delay
•
•
•
•
Throughput / goodput
Network path
Loss
Jitter
Metrics Example
• Video Conferencing
–
–
–
–
Needs predictable bit rate
Doesn’t usually matter if bit rate changes too much
Needs constant jitter
Low one-way delay preferable
• FTP
– Needs reliable transport
– Throughput depends on urgency of data
– Jitter and delay don’t matter
Network Monitoring Uses
• Monitoring is measuring over long periods of
•
•
•
•
time
Gives an indication of network performance over
time – a baseline
Allows comparison of different tools for analysis
Allows analysis of how different protocols
behave in different conditions – in real life
Allows ‘tuning’ of existing protocols to make
most out of network
Possible Users of a NM Web Service
• Network Managers
– See how much bandwidth is being used
• Network Analysts
– Make things faster and better!
• Resource Brokers
• Broker to determine where to send jobs – Network Cost
• Bandwidth Brokers
– Allocate bandwidth depending on current network state
• Replication Managers
– Distribute data only when network is not busy
• QoS Brokers (aka Managed bandwidth Services)
– Universal language for intercommunication..?
• Next Generation FTP
– First look up historical throughputs before sending to determine best path
GridNM
•
•
•
•
•
Architecture for monitoring the network
Backend – collects data for presentation
Logs metrics in ASCII log files on a single host
Allows mesh measurements – all nodes performs
measurements to al other nodes
Uses standard UNIX infrastructure – ssh
– Should be easily adaptable to using Globus
certifications once interactive processing is introduced
in EDG.
GridNM (cont…)
• Uses existing (and future tools) to collect metrics
• Modular - uses XML to describe available
resources
– Hosts
– Tools
• Locks hosts if under measurement – prevents
•
other tests affecting metrics
Currently monitoring 6 sites around Europe
using 5 tools
GridNM ‘plot’
Web Service Network Monitoring
• GridNM just one Network Monitoring
Program
• Many different programs out there!
• Unify data exchange between different
monitoring infrastructures
piPEs
• Internet2 e2ePI Architecture for network monitoring
• Defines information flow to diagnose networks and hosts
•
•
•
•
performance – white paper
Incorporates a ‘finger pointing’ mechanism to identify
poor performers
Ideal starting point!
BUT… found out about it too late…
Currently investigating implementation with SLAC
software + web service as possible implementation of
piPEs software
GGF NMWG
• Defines characteristics that are just the values
•
•
•
•
•
that we are interested in
Defines classes of metrics, e.g. bandwidth, delay
etc. that these characteristics report
Defines singleton and derived characteristics
Defines samples of data and their inherent
sampling patterns
Timestamps
Still in draft form…
GGF NMWG cont. / Schema Design
• As it’s all in XML, designing a XML schema to
•
•
•
describe ‘objects’ to be passed around
XML Schema Document (XSD)
Focusing actually implementing what the NMWG
document says… and doesn’t say…
Note: We are also tackling this from a pure OO
design too – however, due to technical
differences between objects in C++, Java and
SOAP/XML then there may be issues to
overcome…
Part 2
• Network Communication Languages
• Known as transport protocols -
determines how applications put traffic
into the network
• Sits on top of IP – common language of
the internet
Transport Level Protocols
• TCP (HTTP, FTP, GridFTP) used for file transfer
–
–
–
–
Gives guarantee on delivery
All data is copied precisely
Performance can be poor
Respects other internet users
–
–
–
–
Gives no guarantees on delivery
Data may be incomplete
Performance good
Doesn’t respect other internet users
• UDP (Real, H323) used for video conferencing
UDP vs TCP
• Udp: min=274, max=565, ave=493, stdev=43
• Tcp: min=37, max=292, ave=195, stdev=40
• Summary: tcp is rubbish! – why?
Memory and Disk transfers
Fast Ethernet
OC3 Over 60Mbits/s iperf >> file copy
Disk
limited
Les Cottrell, SLAC
Iperf TCP Mbits/s
What does TCP do?
Socket buffer size
• TCP retransmits lost data
• Even retransmits data it ‘thinks’ has been lost!
• Needs and uses a ‘windowing’ system
– Uses ACKnowledgements from reciever
– Grows a Congestion Window ‘cwnd’ to
determine the size of window
TCP Protocol
Network
• Model:
– Tap is independent of Tank size
– Tank filled by application
– Valve opening (data rate) determined by
feedback from network
– Small tanks mean small data rate
– Large tanks mean larger data rate
TCP socket buffer sizes
• Iperf observations: 490
• Standard socket buffer graph
– Shows linear(ish) region followed by plateau
• Optimal socket buffer size just over 2mB
Retransmitted Data
• Graph shows the
amount of
retransmitted data
against the
throughput
• Retransmitted data is
due to loss on the
network
• General case ACK’s
have to timeout
before resending
• We get more
retransmitted data
for low throughputs
with large windows
Measuring Performance of
Transport Level Protocols
• Need to identify what we want to measure – the
•
•
metrics.
Dependant on the use of the transport protocol.
Need to analyse application level usage
For Grid:
– Movement of ‘transient’ data
• File Transfer and Replication
• process jobs or ‘sandboxes’
– Movement of Real-Time Data
• Video Conferencing – Access Grid
• Real-Time applications
Web 100 & TCP
• OSI states that we should not
•
•
•
•
•
•
know anything about the
separate layers
How do we know something is
going wrong? – your
throughput decreases!
Prevents congestion collapse!
Need Web100! Allows in
depth tcp stack analysis per
flow
Kernel patch – 2.4.16,
alpha1.2
New version – 2.4.19
alpha2.0pre1
Using program to grab
web100 results - logvars
Reliability of Web100 results…
• Still alpha… but
•
•
reliable
Graph against iperf
throughputs correlate
very well
At least as reliable as
the result offered by
iperf!
Congestion Window
• Looking at the
max_cwnd achieved for
each measurement…
• Appears to be two
regions
– with high correlation
of throughput and
max cwnd
– A linear region where
we get the a range of
throughputs for same
max_cwnd
• Cwnd never grows
beyond 1500kbytes!
Bandwidth Delay Product
• Window = bandwidth * delay
• We want
– Bandwidth = 1,000,000,000 bit/sec
• We have
– Delay = 19ms
• Window needs to be an average of…
– =1e+9 * 19e-3 / 8 bytes
– =2.25mbytes!
• We only achieve ~1.5mbytes max!
• Need to implement some monitoring of the degree of
the average and variation of cwnd for each tcp
connection…
TCP Optimisation
• It’s actually TCP that is limiting our
transfer rates!
– All applications use it!
• Understandable as TCP hasn’t changed
much for the last 15-20 years!
– When standard link was about 56kbit/sec!
• Solution: Need new TCP implementations!
What is High Speed TCP?
• Changes the way TCP behaves at high speed (ie large
•
cwnd)
Standard TCP has two modes
– Slow start (not very slow…)
– Congestion Avoidance
• Focuses on Congestion Avoidance Region – ie when TCP
•
•
knows (thinks it knows…) how well the network
behaves…
BUT only when we are at high speeds, else do what
normal Standard TCP does…
Readily deployable 1st step towards Equation Based
Congestion Control
What does it do?
• Standard TCP uses two parameters
– Increase parameter, a
– Decrease parameter, b
• i.e. AIMD( a,b )
• Standard TCP uses
– a=1
– b=0.5
• High Speed TCP introduces
– a->a(cwnd)
– b->b(cwnd)
• i.e. The value of a and b depends on the current congestion window size
• If we increase a more with larger cwnd we can get back up to our ‘optimal’
•
cwnd size for the network path
If we decrease b less we don’t lose as much bandwidth due to a small
congestion window
What exactly does it do?
• Based on the TCP response function
– Relates loss and throughput
• Uses the TCP response function to investigate certain
parameters
– High_Window, High_Loss; largest cwnd needed for x throughput
and the required loss for that throughput
– Low_Window, Low_Loss; smallest cwnd when we actually switch
from Standard TCP and the required loss rate for that cwnd size
– High_B; the smallest decrease in b when we are at a large cwnd
• Equations to transform this information into a table for
a(cwnd) and b(cwnd)
Transport Protocols ‘NG’
Name
Transport
UDP Blast
UDP
Tsunami
UDP/TCP
Uses TCP as ‘control’
channel
High Speed TCP
TCP
For 10Gb/sec links
PGM / CC
Modified UDP
Multicast UDP – new
transport protocol
IBP
Notes
Application ‘logistical
networking’