Deep Packet Inspection - Colorado State University
Download
Report
Transcript Deep Packet Inspection - Colorado State University
Deep Packet Inspection
Which Implementation Platform?
Sarang Dharmapurikar
Cisco
Implementation Platform
• Several choices, each with some pros and cons
–
–
–
–
–
ASICs
FPGA
Network Processors
Graphics Processors (nVidia)
multiple-core, multi-threaded Commodity processors
• Needs evaluation with respect to
–
–
–
–
Cost
Speed
Overall system performance (DPI is just a small piece of the puzzle)
Ease of use and upgrading
• A hardware-software co-design approach
– Profile a DPI system and push some components in hardware if the
overall speed up is effective (Ahmdal’s law)
ASIC
• Examples: ClassiPi, NetLogic, Tarari, some Cisco ASICs
• Requires too much investment
– NRE close to a million dollars!
• A long design cycle
– Most of the time is consumed in verification
• Hard to upgrade
– Algorithms evolve
– It is hard to build a flexible enough ASIC
• Applications get locked to a platform
– To migrate to a new platform requires a lot of software rewriting
FPGA
•
Very flexible but expensive and power-consuming
– Virtex-5 offers 330,000 lookup tables units
– 4MB of SRAM
•
Latest Xilinx FPGA contain multiple PowerPC cores
•
Possible to design hybrid hw/sw systems
– The compoents that assist DPI such as TCP-reassembly, normalization, flow
classification done in hardware
•
Several FPGA platforms for networking acceleration available today
– NetFPGA
– FPX
•
Need to be careful in the DPI approach
– The raw signature matching techniques that use FPGA logic resources for each
signature won’t scale
Network Processors
• Intel IXP2850
– 16 micro-engines with
• 2KB D$ and 8KB I$ and 16 entry CAM
– An integrated XScale processor for control path
• 32KB I$ and 32kB D$
– 2 Crypto units
– 16KB shared scratch pad SRAM
• Cisco QuantumFlow processor
– 40 packet processing engines (PPE) each @ 1.2 GHz
– 4 threads per PPE
– Dedicated hardware for queuing, buffering, IP lookup and
classification
Commodity processors
• Really powerful server class processors coming up
– Intel’s Nehalem
• 8 cores
• 2 threads per core
• 32KB L1, 256 KB L2, 10+MB of shared L3 cache
– Sun’s Niagara2
•
•
•
•
8 cores
8 threads per core!
16KB I$ and 8KB D$ per core, 4MB shared L2 cache.
Integrated cryptographic coprocessors units
• Need to think multi-core, multi-threaded
– Think in terms of a complete system, not just pattern matching
– Which core should do what?
• Need to design cache-friendly data structures
Conclusion
• While hardware can assist DPI systems, building
proprietary hardware not a good idea
• Let’s understand the “actual” performance needs
– Let’s not be misguided by “marketing” needs
• Need to think of hardware-software co-design
– Requires careful profiling of DPI systems to identify the
components that can be pushed to hardware
• Need to design algorithms for multi-core multi-threaded
processors