Tree-based IP lookup

Download Report

Transcript Tree-based IP lookup

Advanced topics
in
Computer Networks
Lecture 9: Tree-based lookup
University of Tehran
Dept. of EE and Computer Engineering
By:
Dr. Nasser Yazdani
Univ. of Tehran
Adv. topics in Computer Network
1
Outline




Issues
Multiway and Multicolumn search
DMP-Tree
Some implementation issues
Univ. of Tehran
Adv. topics in Computer Network
2
Issues

How to sort prefixes


Prefixes as ranges
Comparing prefixes




Based on length
Add extra bits at the end.
New definition (DMP-tree)
How to apply tree structures like binary tree
or m_way tree to prefixes
Univ. of Tehran
Adv. topics in Computer Network
3
Multiway tree lookup.



Proposed by G. Varghese and his students.
Consider prefixes as range.
First try: Pad 0’s to prefixes in order to apply
binary search tree.
consider {1*, 101* and 10101* }prefixes
Should match here
100000
101000
101010
Binary search fail for
101011
Binary search ends
here.
Univ. of Tehran
101110
111110
all of them!.
Adv. topics in Computer Network
4
Multiway tree lookup(cont)

Two problem in the previous example



Being Far away from matching prefix
Multiple addresses matching different prefixes end up in
the same region.
Solution: Prefixes as ranges, Put the end of range
in the table.
L
100000
L
101000
We have the explicit
ranges.
L
101010
101011
Search maps to one range
H 101011
only.
H 101110
101111
111110
H
111111
Univ. of Tehran
Adv. topics in Computer Network
5
Multiway tree lookup(cont)
100000
101000
101010
101011
101111
111111
L
L
L
H 101011
H 101110
111110
H
For 101011, we try to find first L which is not followed by H.
For the rest, we can have a stack operation to find the first
L.
Problem: Linear search to find L
Univ. of Tehran
Adv. topics in Computer Network
6
Multiway tree lookup(cont)
Solution: Precompute prefixes corresponding to
ranges.
>
=
100000
101000
101010
101011
101111
111111
Univ. of Tehran
L
L
P1)100000
P2)101000
P3)101010
101011
101111
111111
P1
P2
P3
P2
P1
-
P1
P2
P3
P3
P2
P1
L
H 101011
H 101110
111110
H
1* matching prefix.
Adv. topics in Computer Network
7
DMP-Tree




Comparing prefixes.
Sorting prefixes
Binary prefix Tree.
M_way prefix tree.
Univ. of Tehran
Adv. topics in Computer Network
8
Trie structure

Trie or radix tree
Univ. of Tehran
Adv. topics in Computer Network
9
Sorting prefixes



Question? Why well-known tree structures cannot
be applied to the longest prefix matching problem?
Answer- No a well-known method for sorting.
Definition: Assume Aa1a2…an and B=b1b2…bm
to be prefixes of  and there a character 


1. If n=m, the numerical values of A and B are
compared.
2. If n  m (assume n<m), the two substrings a1a2…an
and b1b2…bn are compared. If a1a2…an and b1b2…bn are
equal, then, the (n+1)th character of string B is checked.
It is considered B>A if bn+1 is before  and B  A
otherwise.
Univ. of Tehran
Adv. topics in Computer Network
10
Sorting prefixes (cont)


Example- Assume M is  Then, BOAT is smaller
than GOAT and SAD is bigger than BALLOON. CAT is
considered bigger than CATEGORY since the fourth
character in CATEGORY, E, is smaller than M.
Sorting is a function to determine the position of each
prefix.
Prefixes of table is sorted as:
00010*,0001*,001100*,01001100*,0100110*,01011,00
1*,01011*,01*,10*,10110001*,1011001*,10110011*,
1011010*,1011*,110*

Univ. of Tehran
Adv. topics in Computer Network
11
Binary prefix tree
•Unfortunately, it fails for 101100001000 Why?
•Prefixes are ranges and not just a data point in the search space.
Univ. of Tehran
Adv. topics in Computer Network
12
Binary prefix tree (cont)
Definition: prefixes A and B are disjoint if

none of them is a prefix of other.
Definition : prefix A is called enclosure if
there exists at least one element set such that
A is a prefix of that element.
We modify the sort structure;


1.
2.
3.
4.
Each enclosure has a bag to put its data element on it.
Sort remaining elements.
Distribute the bag elements to the right and left according
the sort definition.
Apply algorithm recursively.
Univ. of Tehran
Adv. topics in Computer Network
13
Binary prefix tree (cont)

Example- Prefixes in table 1. First step.
The second step,
Note- enclosures are in the higher level than the
contained elements. (important!)
Univ. of Tehran
Adv. topics in Computer Network
14
Binary prefix tree (cont)

The final tree structure
Univ. of Tehran
Adv. topics in Computer Network
15
Sorting prefixes (cont)

Sorting algorithms


Based on bubble sort
Based on Radix sort.
Tmp= MinLength(list)
for all i in list except tmp do
compare i with tmp
if i matches tmp then;
put i in tmp’s bag
if i<tmp then
put i in leftList:
if i>tmp then
put i in rightist:
endfor
list = Sort(leftList)  Sort(rightList)
Univ. of Tehran
Adv. topics in Computer Network
16
M_way prefix tree
Problems with the binary prefix tree.

Two way branching.
The structure is not dynamic and insertion may
cause problems!.


1.
Divide by m after sorting the strings

2.
Static m_way tree.
Build a dynamic data structure like B-tree.


How to guarantee enclosure to be in the higher
level than its contained elements.
Define node splitting and insertion.
Univ. of Tehran
Adv. topics in Computer Network
17
M_way prefix tree (Cont)

Node splitting: Finding the split point.
1.
2.
3.

Take the median if the data elements are
disjoint.
If there is an enclosure containing other
elements, take it as split point.
Otherwise, take an element which gives the best
splitting result.
Note, this does not guarantee the final tree
will be balanced.
Univ. of Tehran
Adv. topics in Computer Network
18
M_way prefix tree (Cont)

Insertion:
1.
2.
3.

If the new element is not an enclosure of
others, find its place and insert in the
corresponding leaf, like B-tree.
Otherwise, replace the closet element with
element and reinsert the replace elements.
Resort the resulted subtree, (space division)
if necessary.
Building tree is similar to building B-tree.
Univ. of Tehran
Adv. topics in Computer Network
19
M-way prefix tree (cont)

Example
Univ. of Tehran
Prefix
Abbrv.
10
01
110
1011
0001
01011
00010
001100
1011001
1011010
0100110
01001100
10110011
10110001
01011001
001011
00111010
A
B
C
D
E
F
G
H
I
J
Prefix
1101110010
10001101
11101101
01010110
00100101
100110100
101011011
11101110
10110111
011010
011011
011101
0110010
101101000
101101110
00011101
011110110
Adv. topics in Computer Network
Abbrv.
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
II
20
M-way prefix tree (cont)
We insert prefixes randomly.
 The tree uses 5 branching factor (at most 4
prefixes in each node)
 Insert
01011, 1011010, 10110001 and
0100110. Then, adding 110 cause overflow.
Split node
 10110001 
(0100110,01011)
(1011010, 110)
(all element are disjoint)

Univ. of Tehran
Adv. topics in Computer Network
21
M-way prefix tree (cont)

Insert 10110011, 1101110010,
Adding 1011001 causes overflow.
00010.
 10110001

1011010 
(00010,0100110,01011) (1011001,10110011) (110,1101110010)
(case 3 of splitting)
 Latter adding 1011 cause problem. It is the
case of adding an enclosure. We will have
space division.
Univ. of Tehran
Adv. topics in Computer Network
22
M-way prefix tree (cont)

The final tree
• The tree supersede B-tree or B-tree is a special
case of this tree. Then, when data element are
relatively disjoint, the height of tree is logMN.
Univ. of Tehran
Adv. topics in Computer Network
23
DMP-Tree
No. of Data
Max. height
50
60
70
80
90
BF
=4
BF
=5
BF
=6
BF
=7
BF
=8
BF
=9
BF
=1
0
BF
=3
25
20
15
10
5
0
100
• BF is Branching factor in the internal nodes.
• No. of Data is in1000s.
Univ. of Tehran
Adv. topics in Computer Network
24
DMP-Tree
Min. memory utilization
No. of Data
50K
0.68
0.66
0.64
0.62
60K
70K
80K
3
4
5
6
7
8
9
branching
10
90K
100K
• Number of prefixes in the right.
Univ. of Tehran
Adv. topics in Computer Network
25
DMP-Tree
• Height of tree for 100K data prefixes.
Height
25
20
Min
15
Ave.
10
Max
5
0
3
Univ. of Tehran
4
5
6
7
8
9
10 Branching
Adv. topics in Computer Network
26
DMP-Tree
Analyzing of results.




With increasing BF, Branching Factor, the
height decreases.
The result are for the worst case, Max
height, and the ave. case is much less.
After BF=9, increasing Branching Factor
does not decrease the max. height.
The results are for the set of prefixes of
50,000-100,000 with lengths from 8 t0 31.
The size of actual prefixes in use is around
50,000 and the length is 8-31.
Univ. of Tehran
Adv. topics in Computer Network
27
DMP-Tree
Memory utilization:,




Mem. Utilization is 0.64%-0.67% without
considering the tree branching overhead.
Mem. Utilization is 0.53%-0.62% with tree
branching overhead (pointers).
Without considering branching pointers, the
mem. Utilization decreases with increasing
the branching factor.
Total mem. Utilization increases with
increasing the branching factor.
Univ. of Tehran
Adv. topics in Computer Network
28
DMP-Tree
Therefore,
 The longest matching prefix of a
network can be determined in 5 steps
with 9 or more branching factor.
 In the worst case, we need at most 2
times of total prefix data size of memory
to implement the scheme. For instance,
for 50,000 prefixes of 32bit, we need at
most 3.2 Mbit of memory.
Univ. of Tehran
Adv. topics in Computer Network
29
Overall Design





All operations need search first in the Tree
structure.
Two search procedures, one for the
longest matching prefix and another for
update.
The prefix tree data structure is on the
chip.
The Policy table is on the off chip memory.
There is a port to data link layer mapping
module.
Univ.
of Tehran
Adv. topics in Computer Network
30
Tree Nodes

Internal nodes
Internal nodes.
Addr
1
[?]
Prefix 1
[33]
Port
[?]
Branching factor
Leaf nodes
Addr
2
[?]
Prefix 2
[33]
port
[?]
…
Addr
[?]
Each prefix has a left and right pointer which
are pointing to left and right subtrees
respectively.

We can have N prefixes in each internal node.
Then, N+1 is the branching factor.

The bigger N, the faster search time, but the
more logic is needed.
Univ.
Tehran is the address
Adv. topics in Computer
 of Port
of theNetwork
port in the switch31to

Tree Nodes

Leaf nodes.
Prefix 1
[33]



port
[?]
Prefix 2
[33]
port
[?]
…
There is no left and right subtree pointers.
The number of prefixes in the leaf node is M.
The leaf nodes are stored in a off chip memory
to make the scheme scalable to the large
number of prefixes.
Univ. of Tehran
Adv. topics in Computer Network
32
Branching Factor

What is the best number for N? (Branching
factor)




The bigger N, the faster search process. (Fact 1)
The bigger N, the more memory pins are and
usually the more mem. Bandwidth is needed
(Fact 2).
The bigger N, the more logic we need to process
the node (Fact 3).
Simulation result shows
The bigger N, the better memory utilization in
the memory.
 For
N  8, theAdv.
max.
of the tree does not
Univ.
of Tehran
topics inheight
Computer Network
33

Simulation
result
Total memory: assuming one memory block and

OC-192.
# of
Prefixes
required
Mem.
Branching
Factor
Mem. Pins
Mem BW
(G/s)
max
Max mem
Access
Mem. Size
(on
chip)mm2
Max
heights
64K
5.4 Mbits
15
897
89.7
4
81
5
64K
5.5 Mbits
11
655
65.5
4
82
5
64K
5.4Mbit
9
527
52.7
4
81
5
64K
6 Mbit
6
335
46.9
6
96
7
64K
6.6 Mbit
5
275
44
7
110
8
64K
6.5 Mbit
4
207
62.1
14
112
15
100K
8.3 Mbit
15
897
89.7
4
122
5
100K
8.5 Mbit
11
655
65.5
4
125
5
100K
8.3
9
527
63.24
5
122
6
100K
9 Mbit
6
335
53.6
7
135
8
100K
9.1Mbit
5
275
49.5
8
140
9
100K
9.5
4
207
62.1
14
150
15
40
5
30KUniv. of Tehran
2.6
9
527topics in Computer
52.7
4 expected
Adv.
Network
34
Branching Factor



It seems any number between 8-16 is
reasonable. But, N=9 gives a better search
time, memory size.
Assuming 9 branching factors in the internal
node, %50 node utilization and 128K
prefixes, we need max. 128K/4.5= 28.5K
address. Then, 15 bit address for left and
right pointers are more than enough. But,
we need more for off chip addressing
The number of switch port are usually
limited, around 64, We can assume 256,
Univ. of Tehran
Adv. topics in Computer
Network
35
then
8 bit is enough
to address
them.
Branching Factor



In order to make the internal node
branching and leaf node branching even,
M=10.
If we want to read a node at once, we will
need 41x10=410 pins which is difficult to
support in one chip.
We can divide a node in two and read/write
in two clock cycles. This reduce the memory
pins to 205 which is affordable.
Univ. of Tehran
Adv. topics in Computer Network
36
Memory requirement

Prefix tree: Assuming 128K prefixes.
N = 9 (BF) and M=10 (BF in leaves), the majority of
prefixes, %80 will be in leaves, assume %65 node
utilization,
# of ave prefixes in a leaf node node = 10*0.6 5= 6.5
# of leaf nodes  128Kx%80x2/6.5 = 31.5K and %10
overhead  35 K
Total off chip memory = 35K x 205(Mem BW) = 7.2 Mbits
Then, we need 16 bits for addressing. 1 bit for
internal/external.
# of internal nodes= 128Kx%20/5.8=4.41K and %10
Link Addr
overhead 4.9 K
Total on chip memory=4.9Kx529K  2.6Mbits 48 bit addr.

Port to link address mapping table.
Univ. of Tehran
Adv. topics in Computer Network
37
Memory requirement
In summary:

# of
on chip Branchin
Prefixes memor g Factor
y Mbits
On
chip
Mem.
Pins
Off chip mem
Mem. Size
memor Access( (on chip)
2
y
search) mm
Mbits
Off
128K
529
7.2
250
Note:




2.6
10
5
40
chip
mem
pins
Branching factor is the # of branching in internal nodes.
The size of the memory scales with the size of data or
# of prefixes.
Power dissip. depends on the r/w freq, current & core voltage
Considering Faraday Mem. Modules
A 10Kx32 bits single port mem size is 36x1.45 mm2.
Univ. of Tehran
Adv. topics in Computer Network
38
Overall Design
Mem.
Ctrl
Memory
root addr
To/From
NP
Search
root content
Update Search
Insertion update delete
CPU
Inter
face
Port2Addr
Ctrl
Univ. of Tehran
MEM
To/From
CPU
Output mem Ctrl
To/From out Mem
Adv. topics in Computer Network
39
Search Path
Addr[32]
Piping
Len[Nx6]
Compare
First[1]
Found[1]
PackAdd[29]
SOA[1]
Prty[1]
GetLen
CResult[1]
Addr[32]
MemAddr[14]
Match[1]
Next
Addr[32]
OutMemAddr
Input
Node
Node
Addr[32]
SOA[1]
InClk[1]
Node
Data[32]
RdAdd[19]
To/From
Off mem
Root
Mem Ctrl
IpAddr[32]
LinkAddr[48]
Cashing
Dispatch
Addr[32]
DataOut[32]
Addr[32]
Port[8]
To Scheduler
There are data assertion signals between blocks which has not been
shown every where because of space limitation.
Univ. of Tehran
Adv. topics in Computer Network
40

Input Module:



Get the packet destination addresses from the
parser.
Do parity checking.
It has the following input signals





Search Path
Input data which 32 bits.
Start of Address, 1 bit, (SOA)
Parity, 1 bit, (prty)
Input clock, (InClk)
It gets Data in two clock cycles, first the IP
address and then, the packet address in the
memory or packet id (cid)
Univ. of Tehran
Adv. topics in Computer Network
41
Search Path

Input Module:
29 bits is used for the packet address and the
last 3 bit for the policy, Then, 512 Mbytes can
be supported to store the packet before
sending them out.
31
2

The 2nd clock cycle
data format:
0
Packet address

InClk
Policy
The timing
SOA
Data
Univ. of Tehran
IpAddr
PackAddr
Or cid
Adv. topics in Computer Network
42
Search Path

Piping Module:

Pipelines the search process.
For new elements from input block does.
For each new IP address do
If found in the hash table
send the packet memory address to dispatcher
Else
Enter IP address and the policy into the pipe FIFO
End do,
Univ. of Tehran
Adv. topics in Computer Network
43
Search Path

Piping Module:
For elements in the FIFO
For the first IP address in FIFO do
If IP address is new then,
assert first signal and send IP address and policy out.
Else if next addr is on chip send the next node address to Mem.
Ctrl.
Else send the next node address to OffMemCtrl.
send to the pipe the IP address and policy.
For the recirculated address
If the node was leaf then,
Send the longest matching address to OutMemCtrl.
Send Policy to Extract port and the packet address to dispatch.
Else
Put the IP address into the FIFO
Replace the longest matching prefix address if a new one found.
Univ. of Tehran
Adv. topics in Computer Network
44
Search Path

Piping Module:

FIFO . Keep the current information of IPs.
IP Addr
[32]
Port [8]
Next Node
[19]
LMPA[]
New [1]
LMPA = Longest Matching Prefix Address
New = 1 new , 0 old
If the packet is new the next address will be zero
and we can read root cash content instead of
reading from memory.
The address is off chip if the first, most
significant bit is 1, otherwise it on chip.
Univ. of Tehran
Adv. topics in Computer Network
45
Search Path

GetLen Module: This module get the length
of prefixes. We add 1 to the end of a prefix
and then padded with ‘0’s to make it 33
bits.
Ex. 11011010  110110101000…0 (33 bits).
Then, we should start from right and the first ‘1’
we meet, the rest is the prefix length.
GetLen can be implemented as a multiplexer with
case statement (32 case statement) and it can
be done in one clock cycle.
Univ. of Tehran
Adv. topics in Computer Network
46
Search Path

Compare Module: compare two prefix A
and B with lengths L1 and L2.
Assume L1>L2 and A[1:L2] is the first L2 bit from
A, Then,
1.
2.
If A[1:L2] = B  A and B match. If A[L2+1] = 0
 A B. Otherwise, A> B.
If A[1:L2] > B  A >B, otherwise A<B.
One of the prefixes here is IP address with length
32.
We assume there are no two identical elements
in the tree.
Univ. of Tehran
Adv. topics in Computer Network
47
Search Path

Next Module: Get the next node address to
read and also the matching prefix and its
corresponding port number.
It gets two signals for each prefix, Match and
ComResult (compare),
Match =1  the prefix match,
ComResult = 1  Prefix is bigger.
It gets the left address of the first prefix, from the
left, such that its ComResult signal is 1.
It compares the matching prefix lengths and the get
the one with the largest length.
Univ. of Tehran
Adv. topics in Computer Network
48
Search Path

Dispatch Module: forms the Routing Group
Address, RGA, from the port number and
send it with packet stored memory address
(PSMA) or CID.
RGA is a 64 bit size bit map. The bit correspond to
port number is set to 1.
PSMA is dispatched first and Port and DLL address
follows.

Cashing Module: keep a cash of IP address
and corresponding port.
IP address
Port[8]
[32]
Univ. of Tehran
Adv. topics in Computer Network
49
Search Path

Cashing Module:
The cash is kept as a FIFO and its depth depends
on the technology.
Check IP address in FIFO.
If the address found, then,
assert found signal.
write IP address on top of FIFO if it is not there
already.
Else
write IP address on top of FIFO
Cashing system always removes the last
reference IP address from the cash.
Univ. of Tehran
Adv. topics in Computer Network
50
Search Process
operations: This is for large prefixes (50K &up)
cashing
Piping
Piping
Piping
Piping
Piping
Dispatch
Off Mem
root
0
2
On chip memory nodes
5
8
11
Leaf node
14
16
19
Time
• This operation is for an IP address lookup
• Piping is the bottleneck in the system and in ave. take 5 cycles.
• Assuming 100 MHZ operation
# of packets = 109/50 = 20 Million
Line speed = 512x20 = 10.24 G for 64 byte packets.
= 256x8x20= 41 G for 256 byte packets.
• It is possible to support higher speeds with duplicating pipe.
Univ. of Tehran
Adv. topics in Computer Network
51
Output pins
Pin Name
Type
Number
DataIn (IP Addr)
Input
DataOut(Port&cid)
Output
32
Port and cid, to schedular
DataBus(cpu)
In/out
32
CPU data bus
CtrlBus(cpu)
In/out
12
CPU control bus
MemData2(Tree)
In/out
205
If off chip is used
MemAddr2(Tree)
In/out
18
If off chip is used
MemCtrl1(Tree)
In/out
8?
If off chip is used
340
This value can change
around 10 percent
Total
Univ. of Tehran
32
Comment
IP address, from parser
Adv. topics in Computer Network
52