Presentation to ESnet - University of Virginia

Download Report

Transcript Presentation to ESnet - University of Virginia

Hybrid network traffic
engineering system (HNTES)
Zhenzhen Yan, M. Veeraraghavan, Chris Tracy
University of Virginia
ESnet
June 23, 2011
Please send feedback/comments to:
[email protected], [email protected], [email protected]
This work was carried out as part of a sponsored
research project from the US DOE ASCR program
office on grant DE-SC002350
1
Outline
• Problem statement
• Solution approach
– HNTES 1.0 and HNTES 2.0 (ongoing)
• ESnet-UVA collaborative work
• Future work: HNTES 3.0 and integrated network
Project web site:
http://www.ece.virginia.edu/mv/research/DOE09/index.html
2
Problem statement
• Hybrid network is one that supports both
IP-routed and circuit services on:
– Separate networks as in ESnet4, or
– An integrated network
• A hybrid network traffic engineering
system (HNTES) is one that moves data
flows between these two services as
needed
– engineers the traffic to use the service type
appropriate to the traffic type
3
Two reasons for using circuits
1. Offer scientists rate-guaranteed connectivity
– necessary for low-latency/low-jitter applications such as
remote instrument control
– provides low-variance throughput for file transfers
2. Isolate science flows from general-purpose flows
Reason
Circuit scope
Rate-guaranteed
connections
Science flow
isolation
End-to-end
(inter-domain)
Per provider
(intra-domain)
4
Role of HNTES
•
HNTES is a network management system and
if proven, it would be deployed in networks
that offer IP-routed and circuit services
5
Outline
• Problem statement
 Solution approach
–
–
–
–
Tasks executed by HNTES
HNTES architecture
HNTES 1.0 vs. HNTES 2.0
HNTES 2.0 details
• ESnet-UVA collaborative work
• Future work: HNTES 3.0 and integrated network
6
Three tasks
executed by HNTES
1.
online:
upon flow arrival
2.
3.
7
HNTES architecture
HNTES 1.0
1.
2.
3.
4.
5.
Offline flow analysis and populate MFDB
RCIM reads MFDB and programs routers to port mirror packets
from MFDB flows
Router mirrors packets to FMM
FMM asks IDICM to initiate circuit setup as soon as it receives
packets from the router corresponding to one of the MFDB flows
IDCIM communicates with IDC, which sets up circuit and PBR for
flow redirection to newly established circuit
8
Heavy-hitter flows
• Dimensions
–
–
–
–
size (bytes): elephant and mice
rate: cheetah and snail
duration: tortoise and dragonfly
burstiness: porcupine and stingray
Kun-chan Lan and John Heidemann, A measurement study of
correlations of Internet flow characteristics. ACM Comput. Netw.
50, 1 (January 2006), 46-62.
9
HNTES 1.0 vs. HNTES 2.0
HNTES 1.0
HNTES 2.0
(tested on ANI testbed)
Dimension of heavyhitter flow
Duration
Size
Circuit granularity
Circuit for each flow
Circuit carries
multiple flows
Heavy hitter flow
identification
Online
Offline
Circuit provisioning
Online
Offline
Flow redirection
(PBRconfiguration)
Online
Offline
Focus: DYNAMIC
(or online) circuit
setup

IDC circuit setup
delay is about 1 minute
HNTES 1.0 logic

Can use circuits only for
long-DURATION flows
10
Rationale for HNTES 2.0
• Why the change in focus?
– Size is the dominant dimension of heavy-hitter
flows in ESnet
– Large sized (elephant) flows have negative
impact on mice flows and jitter-sensitive realtime audio/video flows
– Do not need to assign individual circuits for
elephant flows
– Flow monitoring module impractical if all data
packets from heavy-hitter flows are mirrored
to HNTES
11
HNTES 2.0 solution
• Task 1: offline algorithm for elephant flow
identification - add/delete flows from MFDB
• Nightly analysis of MFDB for new flows (also offline)
– Task 2: IDCIM initiates provisioning of rate-unlimited static
MPLS LSPs for new flows if needed
– Task 3: RCIM configures PBR in routers for new flows
• HNTES 2.0 does not use FMM
MFDB: Monitored Flow Data Base
IDCIM: IDC Interface Module
RCIM: Router Control Interface Module
FMM: Flow Monitoring Module
12
HNTES 2.0:
use rate-unlimited static MPLS LSPs
LSP 1 to site PE router
PNNL-located
ESnet PE router
•
•
•
•
•
•
•
10 GigE
LSP 50 to site PE router
PNWG-cr1
ESnet core router
With rate-limited LSPs: If the PNNL router needs to send elephant flows to 50
other ESnet routers, the 10 GigE interface has to be shared among 50 LSPs
A low per-LSP rate will decrease elephant flow file transfer throughput
With rate-unlimited LSPs, science flows enjoy full interface bandwidth
Given the low rate of arrival of science flows, probability of two elephant flows
simultaneously sharing link resources, though non-zero, is small. Even when this
happens, theoretically, they should each receive a fair share
No micromanagement of circuits per elephant flow
Rate-unlimited virtual circuits feasible with MPLS technology
Removes need to estimate circuit rate and duration
13
HNTES 2.0 Monitored flow
database (MFDBv2)
Row
number
Source Destination
IP
IP address
address
1
2
Flow analysis table
Is the Is
the
source destination
a data a data door?
door?
0 or 1
0 or 1
Day 1 Day 2 ....
Day 30
(total transfer size; if one
day the total transfer size
between this node pair is <
1GB, list 0)
Identified elephant flows table
Row
number
1
2
Source IP Destination IP Ingress
address
address
Router
ID
Egress
Router
ID
Circuit number
Existing circuits table
Row number
Ingress Router ID
Egress Router ID
14
HNTES 2.0 Task 1
Flow analysis table
• Definition of “flow”: source/destination IP
address pair (ports not used)
• Add sizes for a flow from all flow records in say
one day
• Add flows with total size > threshold (e.g. 1GB) to
flow analysis table
• Enter 0 if a flow size on any day after it first
appears is < threshold
• Enter NA for all days other than when it first
appears as a > threshold sized flow
• Sliding window: number of days
15
HNTES 2.0 Task 1
Identified elephant flows table
• Sort flows in flow analysis table by a metric
• Metric: weighted sum of
– persistency measure
– size measure
• Persistency measure: Percentage of days in which size is
non-zero out of the days for which data is available
• Size measure: Average per-day size measure (for days in
which data is available) divided by max value (among all
flows)
• Set threshold for weighted sum metric and drop flows
whose metric is smaller than threshold
• Limits number of rows in identified elephant flows table
16
Sensitivity analysis
• Size threshold, e.g., 1GB
• Period for summation of sizes, e.g., 1
day
• Sliding window, e.g., 30 days
• Value for weighted sum metric
17
Is HNTES 2.0 sufficient?
• Will depend on persistency measure
– if many new elephant flows appear each day,
need a complementary online solution
• Online  Flow Monitoring Module (FMM)
18
Outline
• Problem statement
• Solution approach
– HNTES 1.0 and HNTES 2.0 (ongoing)
 ESnet-UVA collaborative work
– Netflow data analysis
– Validation of Netflow based size estimation
– Effect of elephant flows
• SNMP measurements
• OWAMP data analysis
– GridFTP transfer log data analysis
• Future work: HNTES 3.0 and integrated network
19
Netflow data analysis
• Zhenzhen Yan coded OFAT (Offline flow analysis
tool) and R program for IP address anonymization
• Chris Tracy is executing OFAT on ESnet Netflow
data and running the anonymization R program
• Chris will provide UVA Flow Analysis table with
anonymized IP addresses
• UVA will analyze flow analysis table with R
programs, and create identified elephant flows
table
• If high persistency measure, then offline solution
is suitable; if not, need HNTES 3.0 and FMM!
20
Findings: NERSC-mr2, April 2011
(one month data)
Persistency measure = ratio of (number of days in which flow size > 1GB)
to (number of days from when the flow first appears)
Total number of flows = 2281
21
Number of flows that had > 1GB transfers every day = 83
Data doors
• Number of flows from NERSC data doors = 84
(3.7% of flows)
• Mean persistency ratio of data door flows = 0.237
• Mean persistency ratio of non-data door flows =
0.197
• New flows graph right skewed  offline is good
enough? (just one month – need more months’ data
analysis)
• Persistency measure is also right skewed  online
may be needed
22
Validation of size estimation
from Netflow data
• Hypothesis
– Flow size from concatenated Netflow
records for one flow can be multiplied by
1000 (since the ESnet Netflow sampling
rate is 1 in 1000 packets) to estimate
actual flow size
23
Experimental setup
• GridFTP transfers of 100 MB, 1GB, 10 GB files
• sunn-cr1 and chic-cr1 Netflow data used
Chris Tracy set up this experiment
24
Flow size
estimation experiments
• Workflow inner loop (executed 30 times):
– obtain initial value of firewall counters at sunn-cr1
and chic-cr1 routers
– start GridFTP transfer of a file of known size
– from GridFTP logs, determine data connection TCP
port numbers
– read firewall counters at the end of the transfer
– wait 300 seconds for Netflow data to be exported
• Repeat experiment 400 times for 100MB, 1 GB
and 10 GB file sizes
Chris Tracy ran the experiments
25
Create log files
• Filter out GridFTP flows from Netflow data
• For each transfer, find packet counts and
byte counts from all the flow records and add
• Multiply by 1000 (1-in-1000 sampling rate)
• Output the byte and packet counts from the
firewall counters
• Size-accuracy ratio = Size computed from
Netflow data divided by size computed from
firewall counters
Chris Tracy wrote scripts to create these log
files and gave UVA these files for analysis
26
Size-accuracy ratio
Netflow records obtained
from Chicago ESnet router
Netflow records obtained from
Sunnyvale ESnet router
Mean
Mean
100 MB 0.949
1 GB
0.996
10 GB
0.990
•
•
•
•
Standard
deviation
0.2780
0.1708
0.0368
1.0812
1.032
0.999
Standard
deviation
0.3073
0.1653
0.0252
Sample mean shows a size-accuracy ratio close to 1
Standard deviation is smaller for larger files.
Dependence on traffic load
Sample size = 50
27
Zhenzhen Yan analyzed log files
Outline
• Problem statement
• Solution approach
– HNTES 1.0 and HNTES 2.0 (ongoing)
 ESnet-UVA collaborative work
– Netflow data analysis
– Validation of Netflow based size estimation
 Effect of elephant flows
• SNMP measurements
• OWAMP data analysis
– GridFTP log analysis
• Future work: HNTES 3.0 and integrated network
28
Effect of elephant flows
on link loads
10 Gb/s
2.5 Gb/s
SUNN-cr1
interface
SNMP load
CHIC-cr1
interface
SNMP load
1 minute
• SNMP link load averaging over 30 sec
• Five 10GB GridFTP transfers
• Dashed lines: rest of the traffic load
Chris Tracy
29
OWAMP (one-way ping)
• One-Way Active Measurement Protocol
(OWAMP)
– 9 OWAMP servers across Internet2 (72 pairs)
– The system clock is synchronized
– The “latency hosts” (nms-rlat) are dedicated only to
OWAMP
– 20 packets per second on average (10 for ipv4, 10
for ipv6) for each OWAMP server pair
– Raw data for 2 weeks obtained for all pairs
30
Study of “surges”
(consecutive higher OWAMP delays on 1-minute basis)
• Steps:
• Find the 10th percentile delay b across
the 2-weeks data set
• Find the 10th percentile delay i for each
minute
• If i > n × b, i is considered a surge point
(n = 1.1, 1.2, 1.5)
• Consecutive surge points are combined
as a single surge
31
Study of surges cont.
• Sample absolute values of 10th percentile delays
CHIC-LOSA
CHIC-KANS
KANS-HOUS
HOUS-LOSA
LOSA-SALT
10th percentile
>1.1×(10th percentile)
29 ms
31 ms
5 ms
5.9 ms
6.7 ms
7.3 ms
16.1 ms
17.5 ms
7.3 ms
8.5 ms
>1.2×(10th percentile)
34 ms
6.3 ms
8 ms
19 ms
9.5 ms
>1.5×(10th percentile)
NA
NA
NA
23.9 ms
11.6 ms
32
PDF of surge duration
• a surge lasted for 200 mins
• the median value is 34 mins
33
95th percentile per minute
CHIC-LOSA
CHIC-KANS
KANS-HOUS
HOUS-LOSA
LOSA-SALT
10th percentile of 2 weeks
29 ms
5 ms
6.7 ms
16.1 ms
7.3 ms
>1.2×(10th percentile)
33 ms
6.4 ms
8 ms
18.7 ms
9.3 ms
>1.5×(10th percentile)
50 ms
8.1 ms
18.8 ms
23.9 ms
11.5 ms
>2×(10th percentile)
58 ms
11 ms
18.8 ms
40.7 ms
NA
>3×(10th percentile)
84 ms
17 ms
NA
53.8 ms
NA
Max of 95th percentile
119.8 ms
50.5 ms
NA
86.7 ms
NA
•The 95 percentile delay per min was 4.13 (CHIC-LOSA),
10.1 (CHIC-KANS) and 5.4 (HOUS-LOSA) times the one
way propagation delay
34
Future work
Determine cause(s) of surges
• Host (OWAMP server) issues?
– In addition to OWAMP pings, OWAMP server pushes
measurements to Measurement Archive at IU
• Interference from BWCTL at HP LAN switch
within PoP?
– Correlate BWCTL logs with OWAMP delay surges
• Router buffer buildups due to elephant flows
– Correlate Netflow data with OWAMP delay surges
• If none of above, then surges due to router
buffer buildups resulting from multiple
simultaneous mice flows
35
GridFTP data analysis findings
• All GridFTP transfers from NERSC GridFTP servers
that > 100 MB: one month (Sept. 2010)
• Total number of transfers: 124236
• Data from GridFTP logs
Size (bytes)
Duration (sec)
Throughput
Minimum
100003680
0.25
1.2 Mbps
Median
104857600
2.5
348 Mbps
Maximum
96790814720
= 90 GB
9952
4.3 Gbps
36
Throughput of GridFTP
transfers
• Total number of
transfers: 124236
• Most transfers get
about 50 MB/sec
or 400 Mb/s
37
Variability in throughput for
files of the same size
Throughput in bits/s
Minimum
7.579e+08
1st quartile
1.251e+09
Median
1.499e+09
Mean
1.625e+09
3rd quartile
1.947e+09
Maximum
3.644e+09
• There were 145 file transfers of size 34359738368 (bytes)
– 34 GB approx.
• IQR (Inter-quartile range) measure of variance is 695 Mbps
• Need to determine other end and consider time
38
Outline
• Problem statement
• Solution approach
– HNTES 1.0 and HNTES 2.0 (ongoing)
• ESnet-UVA collaborative work
 Future work: HNTES 3.0 and integrated network
39
HNTES 3.0
• Online flow detection
 Packet header based schemes
– Payload based scheme
– Machine learning schemes
• For ESnet
– Data door IP address based 0-length (SYN) segment
mirroring to trigger PBR entries (if full mesh of LSPs),
and LSP setup (if not a full mesh)
– PBR can be configured only after finding out the other
end’s IP address (data door is one end)
– “real-time” analysis of Netflow data
• Need validation by examining patterns within each day
40
HNTES in an
integrated network
• Setup two queues on each ESnet physical
link; each rate-limited
• Two approaches
• Use different DSCP taggings
– General purpose: rate limited at 20% capacity
– Science network: rate limited at 80% capacity
• IP network + MPLS network
– General purpose: same as approach I
– Science network: full mesh of MPLS LSPs mapped
to 80% queue
41
Ack: Inder Monga
Comparison
• In first solution, there is no easy way to achieve
load balancing of science flows
• Second solution:
– MPLS LSPs are rate unlimited
– Use SNMP measurements to measure load on each of
these LSPs
– Obtain traffic matrix
– Run optimization to load balance science flows by
rerouting LSPs to use whole topology
– Science flows will enjoy higher throughput than in the
first solution because TE system can periodically readjust routing of LSPs
42
Discuss integration with IDC
• IDC established LSPs have rate
policing at ingress router
• Not suitable for HNTES redirected
science flows
• Add a third queue for this category
Discussion with Chin Guok
43
Summary
• HNTES 2.0 focus
–
–
–
–
Elephant (large-sized) flows
Offline detection
Rate-unlimited static MPLS LSPs
Offline setting of policy based routes for flow
redirection
• HNTES 3.0
– Online PBR configuration
– Requires flow monitoring module to receive port mirrored
packets from routers and execute online flow redirection
after identifying other end
• HNTES operation in an integrated network
44