Transcript sigcomm14

DREAM:
Dynamic Resource Allocation for
Software-defined Measurement
(SIGCOMM’14)
1
Masoud Moshref, Minlan Yu,
Ramesh Govindan, Amin Vahdat
Measurement is Crucial for Network Management
Tenant:
Netflix
Expedia
Reddit
Anomaly
Failure
Detection
Detection
Traffic
Traffic
Engineering
Engineering
Management:
Accounting
Accounting
Measurement:
Heavy
Hitter
detection
Heavy
Hitter
Heavy Hitterdetection
detection
Change
detection
Change
Changedetection
detection
Network:
Motivation
Motivation
System
Algorithm
Evaluation
2
High Level Contribution: Flexible Measurement
Management:
Users dynamically instantiate complex measurements
on network state
Measurement:
DREAM supports the largest number of measurement
tasks while maintaining measurement accuracy,
by dynamically leveraging tradeoffs between switch
resource consumption and measurement accuracy
Network:
We leverage unmodified hardware
and existing switch interfaces
Motivation
Motivation
System
Algorithm
Evaluation
3
Prior Work: Software Defined Measurement (SDM)
Controller
Heavy Hitter detection
Change detection
3 Update
1 Install
rules
rules
2 Fetch counters
#Bytes=1M
#Bytes=5M
Source IP: 10.0.1.128/30
10.0.1.130/31
Source IP: 55.3.4.34/31
55.3.4.32/30
Motivation
Motivation
System
Algorithm
Evaluation
4
Our Focus: Measurement Using TCAMs
Existing OpenFlow switches use TCAMs which permit
counting traffic for a prefix
Focus on TCAMs enables immediate deployability
Prior work has explored other primitives
such as hash-based counters
Motivation
Motivation
System
Algorithm
Evaluation
5
Challenge: Limited TCAM Memory
31
26
Find source IPs sending > 10Mbps
5
13
13 2
00
01 10
Controller
3
11
Heavy Hitter detection
1 Install rules
2 Fetch counters
00
13MB
01
13MB
10
3MB
11
2MB
Problem: Requires too many TCAMs
64K IPs to monitor a /16 prefix >> ~4K TCAMs at switches
Motivation
Motivation
System
Algorithm
Evaluation
6
Reducing TCAM Usage
Monitor internal nodes to reduce TCAM usage
31
26
Monitoring 1* is enough
because a node with size 5
cannot have leaves >10
5
13
13 2
3
00
01 10
11
31
26
5
13
13 2
3
00
01 10
11
Motivation
Motivation
System
Algorithm
Evaluation
7
Challenge: Loss of Accuracy
Fixed configuration misses heavy hitters as traffic changes
31
26
13
5
13 2
3
Missed heavy hitters
39
9
4
Motivation
Motivation
30
5 15
System
15
Algorithm
Evaluation
8
Dynamic Configuration to Avoid Loss of Accuracy
39
9
4
Find leaves >10Mbps using 3 TCAMs
30
5
15
Merge
15
Divide
Monitor children to detect HHs
but using 2 TCAMs
39
9
30
Monitor parent to save a TCAM
4
5
Motivation
Motivation
15
15
System
Algorithm
Evaluation
9
Reducing TCAM Usage: Temporal Multiplexing
Required TCAM changes over time
# TCAMs Required
Task 1
Task 2
Time
Motivation
Motivation
System
Algorithm
Evaluation
10
Reducing TCAM Usage: Spatial Multiplexing
Required TCAMs varies across switches
# TCAMs Required
Switch A
Switch B
Time
Only needs more TCAMs at switch A
Motivation
Motivation
System
Algorithm
Evaluation
11
Reducing TCAM Usage: Diminishing Returns
1
7%
Accuracy
0.8
Accuracy Bound
12%
0.6
0.4
0.2
0
256 512
1024
TCAMs
2048
Can accept an accuracy bound <100% to save TCAMs
Motivation
Motivation
System
Algorithm
Evaluation
12
Key Insight
Leverage spatial and temporal multiplexing and
diminishing returns
to dynamically adapt the configuration and allocation
of TCAM entries per task
to achieve sufficient accuracy
Motivation
Motivation
System
Algorithm
Evaluation
13
DREAM Contributions
System
Supports concurrent instances of three task types:
Heavy Hitter, Hierarchical HH and Change Detection
Algorithm
Dynamically adapts tasks TCAM allocations and
configuration over time and across switches,
while maintaining sufficient accuracy
Evaluation
Significantly outperforms fixed allocation and
scales well to larger networks
Motivation
Motivation
System
Algorithm
Evaluation
14
Anomaly detection
Traffic engineering
Network provisioning
Accounting
Network visualization
DDoS detection
Heavy Hitter detection
Hierarchical HH detection
DREAM
Change detection
Network
Measurement
Management
DREAM Tasks
Motivation
Architecture
System
Algorithm
Evaluation
15
DREAM Workflow
• Task type
• Task parameters
• Task filter
• Accuracy bound
Instantiate task
Report
Task Instance 1
TCAM Allocation
and Configuration
DREAM
SDN Controller
Motivation
Task Instance n
Configure counters
Architecture
System
Algorithm
Fetch counters
Evaluation
16
Algorithmic Challenges
Dynamically adapts tasks TCAM allocations and
configuration
configuration
configuration over time and across switches,
switches
while maintaining sufficient accuracy
accuracy
How to allocate TCAMs for
sufficient accuracy?
Diminishing Return
Which switches to allocate?
Temporal Multiplexing
How to adapt TCAM configuration
on multiple switches?
Spatial Multiplexing
Motivation
System
Algorithm
Evaluation
17
Dynamic TCAM Allocation
Allocate TCAM
Why iterative approach?
Estimate accuracy
Measure
Enough
 for
High
accuracy
Satisfied
We cannot
know TCAMs
the curve
every
trafficand
task instance
Thus we cannot formulate a one-shot optimization
Not enough TCAMs  Low accuracy  Unsatisfied
1
Why estimating accuracy?
Accuracy
0.8
We don’t have ground-truth
Thus we must estimate accuracy
0.6
0.4
0.2
0
256 512
Motivation
System
1024
TCAMs
2048
Algorithm
Evaluation
18
Estimate Accuracy: Heavy Hitter Detection
Precision =
True detected HH
Detected HHs
Is 1 because any detected HH is a true HH
Recall =
True detected HH
True detected + Missed HHs
Estimate missed HHs
Motivation
System
Algorithm
Evaluation
19
Estimate Recall for Heavy Hitter Detection
True detected HH
True detected + Missed HHs
Recall =
Find an upper bound of missed HHs
using size and level of internal nodes
Threshold=10Mbps
With size 26:
missed <=2 HHs
At level 2:
missed <=2 HH
76
26
50
12
5
Motivation
14
7
12
System
15
2
0
Algorithm
35
15
20
15
Evaluation
20
Allocate TCAM
Goal: maintain high task satisfaction
Fraction of task’s lifetime with sufficient accuracy
How many TCAMs to exchange?
Large  Oscillations
Accuracy
Accuracy
Small  Slow convergence
Time
Motivation
System
Algorithm
Time
Evaluation
21
Avoid Overloading
Not enough TCAMs to satisfy all tasks
Solutions
Reject new tasks
Drop existing tasks
Motivation
System
Algorithm
Evaluation
22
Algorithmic Challenges
Dynamically adapts tasks TCAM allocations and
configuration over time and across switches,
while maintaining sufficient accuracy
How to allocate TCAMs for
sufficient accuracy?
Diminishing Returns
Which switches to allocate?
Temporal Multiplexing
How to adapt TCAM configuration
on multiple switches?
Spatial Multiplexing
Motivation
System
Algorithm
Evaluation
23
Allocate TCAM: Multiple Switches
A task can have traffic from multiple switches
Controller
Heavy Hitter detection
20 HHs
A
B
30 HHs
10 HHs
Local accuracy is important
Global accuracy is important
If a task is globally
unsatisfied,
increasing
B’s
TCAMs is expensive
Use
both
local
and
global
accuracy
If a task is globally satisfied, no need to increase A’s TCAMs
(diminishing returns)
Motivation
System
Algorithm
Evaluation
24
DREAM Modularity
Task Independent
TCAM Configuration:
Divide & Merge
Task Dependent
Accuracy Estimation
TCAM Allocation
DREAM
Motivation
System
Algorithm
Evaluation
25
Evaluation: Accuracy and Overhead
Accuracy
Satisfaction of a task: Fraction of task’s lifetime
with sufficient accuracy
% of rejected/dropped tasks
Overhead
How fast is the DREAM control loop?
Motivation
System
Algorithm
Evaluation
26
Evaluation: Alternatives
Equal: divide TCAMs equally at each switch, no reject
Fixed: fixed fraction of TCAMs, reject extra tasks
Motivation
System
Algorithm
Evaluation
27
Evaluation Setting
Prototype on 8 Open vSwitches
• 256 tasks (HH, HHH, CD, combination)
• 5 min tasks arriving in 20 mins
• Accuracy bound=80%
• 5 hours CAIDA trace
• Validate simulator using prototype
Large scale simulation (4096 tasks on 32 switches)
• accuracy bounds
• task loads (arrival rate, duration, switch size)
• tasks (task types, task parameters e.g., threshold)
• # switches per tasks
Motivation
System
Algorithm
Evaluation
28
Prototype Results: Average Satisfaction
DREAM: High satisfaction of tasks
at the expense of more rejection for small switches
100
0.8
0.6
0.4
Dream
Equal
Fixed
0.2
0
DREAM-reject
Fixed-reject
DREAM-drop
80
512
1024 2048 4096
Switchincapacity
# TCAMs
Switch
% of tasks
Satisfaction
Satisfaction
Average
1
60
40
20
0
512
1024
2048
Switch in
capacity
# TCAMs
Switch
4096
Fixed: High rejection as over-provisions for small tasks
Motivation
System
Algorithm
Evaluation
29
Prototype Results: 95th Percentile Satisfaction
Satisfaction
Satisfaction
95th Percentile
DREAM: High 95th percentile satisfaction
1
0.8
0.6
Dream
Equal
Fixed
0.4
0.2
0
512
1024 2048 4096
#Switch
TCAMscapacity
in Switch
Equal and Fixed only keep small tasks satisfied
Motivation
System
Algorithm
Evaluation
30
Conclusion
Measurement is crucial for SDN management
in a resource-constrained environment
Dynamic TCAM allocation across measurement tasks
• Diminishing returns in accuracy
• Spatial and temporal multiplexing
Future work
• More TCAM-based measurement tasks (quintiles for
load balancing, entropy detection)
• Hash-based measurements
DREAM is available at
github.com/USC-NSL/DREAM
31