Transcript Bez nadpisu

Full Packet Monitoring Sensors:
Hardware and Software Challenges
Vladimír Smotlacha
CESNET
High-speed network monitoring
Scalability limited by:
• throughput of local bus
- flow at 10 Gb/s exceeds throughput of PCI-X 64/133
• CPU performance
• data handling in RAM
• disk systems
- amount of stored data
- sustained write speed
Flow based monitoring
Motivation: describe dynamics of link traffic
• Elementary flow specified by
- source and host IP address
- transport protocol
- source and destination port (if applicable)
- start and end time (Timeouts ! )
• Flow data aggregation
- end point - host, network, AS
- time granularity
• Example: NetFlow
- implemented in routers
- database of open flows
- statistics of each flow
Packet based monitoring
Motivation: describe dynamics of selected connections
• Flow specification
- all packets that match arbitrary criteria (e.g., “all UDP and TCP
packets sent to port 456”)
- flow is dealt as generalized socket
- filter is expressed in a special language (e.g., BPF, FPL, C library)
• Example: pcap
- based on BPF
- used in tcpdump, snort, ntop, ngrep, ethereal, ...
- intuitive way of writing filters
Software optimization
• Performance
- effective filters - CPU instructions/packets
- optimal manipulation with packets - memory mapping
- parallelism in packet processing
examples:
• FFPF
- new extensible language
- intensive computation pushed into kernel
- support of network processors
• nCap
- handle full 1 Gbps data flow
Monitoring API
Basic abstraction: network flow
- create & terminate the flow
- read packets from the flow
- apply functions to the flow
- read results of functions
MAPI functions
- filtering ( BPF filters)
- logging
- accounting
- sampling
- cooking (IP defragmentation & TCP reassembly)
- string search
Hardware-software codesign
Putting functionality down to the hardware
• FFPF
- support of network processors
• MAPI
- utilizes available functionality
- DAG cards
- SCAMPI cards
Intelligent hardware adapters
Goals
- reduce the amount of data passing local bus
- reduce CPU load and memory request
- do complex classification of packets
- move computational intensive algorithms to adapter
- introduce new parallel algorithms
- accurate timestamps
Adapters functionality
• Timestamping
- unique accurate timestamp to each packet
- clock synchronization required
• Header based filtering
- rule to specify passing through packets
or
• Header based classification
- one rule per each class
- disjunctive rules - packets belongs to one class
- non-disjunctive rules - packet can belong to more classes
Adapters functionality (cont)
• Packet shrinking
- cut unnecessary payload to reduce data
• Sampling
- reduction of packet number
- deterministic x probabilistic
• Calculation of statistics
- based on packet length x time interval between packets
• String searching
- packets containing string pass the unit
SCAMPI adapter
Packet classification
CAM - matching a (sub)field with a constant value
(e.g., IP address, network address, protocol)
Processing unit - arithmetic comparison with a constant
value (e.g., port, interval of port values)
Whenever possible, comparison is done in CAM
Pair (C,P)
• C - CAM row (with “don’t care” bits)
• P - sequence of comparison (conditional jump) instructions
Semantics
• matching row C of CAM points to an instruction sequence P
• instruction result:
• assign packet to a class & stop (packet classified)
• stop without assigning (not classified)
• continue with next instruction
Filter language - FL
• Primitive operation: comparison of an arbitrary header field
with a constant
•Filter specification: expression consisting of primitive
operations, ‘and’, ‘or’, ‘not’ and brackets
•Implementation
• expression is transformed to DNF
example: „A
and (C or D) and (E or F) or G and H“
is equal to „ACE or ACF or ADE or ADF or GH“
• each primitive operation or a conjunction of them is translated to
max. one pair (C, P)
• FL expression in DNF is translated to a number of pairs (C, P)
Searching of string
• CAM with 272 bits wide row
• Algorithm implemented in hardware:
- 16 byte long string stored in 16 rows CAM, shifted by 0,1,2,... 15
bytes
- comparison with 32 bytes of payload in one CAM
- in next cycle, payload is shifted for 16 bytes
• Implementation in Scampi
- search of more then 100 strings simultaneously
- designed throughput 3 Gb/s
• Issues
- finds only first occurrence of any string
- in case of longer strings lot of false positives -> additional software
verification
Open problems
• Searched string occurs on border of two packets
- solution: flow cooking in adapter
• Dealing with non-disjunctive classes
- solution: evaluation of all intersections -> possibly exponential
number of new pairs (C, P)