1. - CSIE -NCKU

Download Report

Transcript 1. - CSIE -NCKU

Memory-Efficient and Scalable
Virtual Routers Using FPGA
Author:
Hoang Le, Thilan Ganegedara and Viktor K. Prasanna
Publisher:
ACM/SIGDA FPGA '11
Presenter:
Zi-Yang Ou
Date:
2011/10/12
1
About FPGA’11
2
Introduction

Network virtualization is a technique to consolidate
multiple networking devices onto a single hardware
platform.

To make efficient use of the networking resources.

An abstraction of the network functionality away from the
underlying physical network.

Separated and Merged
3
Contributions




A simple merging algorithm results in the total amount of
required memory to be less sensitive to the number of
routing tables, but to the total number of virtual prefixes.
A tree-based architecture for IP lookup in virtual router
that achieves high throughput and supports quick update.
Use of external SRAMs to support large virtual routing
table of up to 16M virtual prefixes.
A scalable design with linear storage complexity and
resource requirements.
4
Set-Bounded Leaf-Pushing
Algorithm






In order to use tree search algorithms, the given set of
prefixes needs to be processed to eliminate the overlap
between prefixes. This elimination process results in a set
of disjoint prefixes.
1. Build a trie from the given routing table.
2. Grows the trie to a full tree.
3. Pushes all the prefixes to the leaf nodes.
The prefix tables expand about 1.6 times after leaf pushing.
About 90% of the prefixes are leaf prefixes.
5
Set-Bounded Leaf-Pushing
Algorithm
There are 3 steps involved in the algorithm:
1. Move the leaves of the trie into Set 1
2. Trim the leaf-removed trie
3. Leaf-push the resulting trie and move the leaves into Set 2



NS1 = lN
NS2 = kN(1 − l)
N‘ = NS1 + NS2 = kN + lN(1 − k) = N(k + l − kl)
l = 0.9 k = 1.6 N‘ = 1.06N
6
2-3 Tree
7
Merging Algorithm
8
Merging Algorithm
9
Merging Algorithm
10
Merging Algorithm
11
Merging Algorithm
12
Merging Algorithm
13
IP Lookup Algorithm for Virtual
Router
14
IP Lookup Algorithm for Virtual
Router
15
Memory Requirement
16
Memory Requirement

M1 = |S1|(L + LP + LV ID + LNHI + 2LPtr1)
M2 = |S2|(L + LP + LV ID + LNHI + 2LPtr2)

M = M1 +M2 = (|S1| + |S2|)(L + LP + LV ID + LNHI)
+ 2|S1|LPtr1 + 2|S2|LPtr2
M = (|S1| + |S2|)(L + LP + logm + LNHI)
+ |S1|LPtr1 + |S2|LPtr2


|S1| + |S2| = 1.06N , O(N × logm).
17
Overall Architecture
18
Overall Architecture
19
Overall Architecture
20
Memory Management
21
Virtual Routing Table Update
22
Scalability

The per-level required memory size grows exponentially as
we go from one level to the next of the tree. Therefore, we
can move the last few stages onto external SRAMs.
23
IMPLEMENTATION





M = (|S1| + |S2|)(L + LP + LVID + LNHI)
+ |S1|LPtr1 + |S2|LPtr2
M = NS(L + LP + LVID + LNHI + LPtr)
MIPv4 = NS(32 + 5 + 5 + 6 + 20) = 68NS
MIPv6 = NS(128 + 7 + 5 + 6 + 20) = 164NS
A state-of-the-art FPGA device with 36 Mb of on-chip
memory (e.g. Xilinx Virtex6) can support up to 530K
prefixes (for IPv4), or up to 220K prefixes (for IPv6),
without using external SRAM.
24
IMPLEMENTATION

In our design, external SRAMs can be used to handle even
larger routing tables, by moving the last stages of the
pipelines onto external SRAMs.

Thus, the architecture can support up to 16M prefixes,
or 880K prefixes for IPv4 and IPv6, respectively.
25
Experimental Setup
26
Experimental Setup
27
Performance Comparison




Candidates : trie-overlapping (A1), trie braiding (A2)
Time complexity:
Our algorithm:O(N) A1 : O(NlogN) A2 : O(N^2)
Memory efficiency:
Our algorithm: 15 MB for 1.3M prefixes
A1 : 9MB for 290K prefixes
A2 : 4.5MB for 290K prefixes
Quick-update capability:
A1 and A2 : reconstruct the entire lookup data structure
28