each packet - Computer Science and Engineering

Download Report

Transcript each packet - Computer Science and Engineering

Traceback
Pat Burke
Yanos Saravanos
Agenda



Introduction
Problem Definition
Traceback Methods




Packet Marking
Hash-based
Conclusion
References
Why Use Traceback?




SPAM
DoS
Insider attacks
Worms / Viruses


Code Red (2001) spreading at 8 hosts/sec
Slammer Worm (2003) spreading at 125
hosts/sec
Why Use Traceback?

Currently very difficult to find spammers, virus
authors


Easy to spoof IPs
No inherent tracing mechanism in IP


Blaster virus author left clues in code, was eventually
caught
What if we could trace packets back to point
of origin?
Packet Tracing
Benchmarks

Effect on throughput



False positive rate
Computational intensity




Amount of overhead added to the packets
Time required to trace an attack
Amount of data required to trace an attack
CPU/memory usage on router
Collisions

For hash-based traceback methods
FDPM
Flexible Deterministic Packet Marking
Packet Marking


Add information to the packets so that paths
can be retraced to original source
Methods for marking packets

Probabilistic



Node Marking
Edge Marking
Deterministic
Probabilistic Packet Marking (PPM)

Using probability, router
marks a packet


With router IP address
(node marking)
With edge of paths (edge
marking)

95% accuracy, requires
~300,000 packets

PPM needs a large number
of packets and creates very
high computational load
PPM Nodes - Cons

Large number of false positives


Slow convergence rate



DDoS with 25 hosts requires several days and
has thousands of false positives
For 95% success, we need 300,000 packets
Attacker can still inject modified packets into
PPM network (mark spoofing)
This is only for a single attacker
Deterministic Packet Marking (DPM)


Every packet is
marked
Spoofed marks
are overwritten
with correct marks
DPM





Incoming packets are marked
Outgoing packets are unaltered
Requires more overhead than PPM
Less computation required
Probability of generating ingress IP address
(1-p)d-1
DPM Mark


16-bit mark field
1-bit flag field
DPM Mark Encoding


Two 16-bit fields and a
1-bit flag
IP populates one of the
two fields with
probability of 0.5


Set flag to 1 if using the
higher end bits
Can be made more
secure by using nonuniform probability
distributions
DPM


Claimed to have 0 false positives
Claimed to converge very quickly




99% probability of success with 7 packets
99.9% probability of success with only 10 packets
Has not been tested on large networks
Cannot deal with NAT
Flexible DPM


DPM uses 17 bits in the IP header to store
marking information
FDPM uses a variable length TOS added to
the mark


TOS is between 0 and 8 bits, mark is 17-25 bits
Split ingress IP into k segments, send in
separate packets


Segment numbers keep address order consistent
Reconstruction process recognizes packets from
same source
FDPM Reconstruction

Mark Recognition



Store reconstruction packets in cache
Split IP header into fields to find mark length
Address Recovery


Analyze and store mark in recovery table
Different source IPs may have the same digest
(hash value) and collisions may occur

More than one entry is created
Flow-Based Marking

Mark packets selectively according to flow
properties when router has heavy traffic



Reduce load on router while still marking
Packets are classified according to
destination IP address
Uses flow thresholds

Lmax is threshold where router’s load is exceeded
(called the overload problem)
Traceable Sources
Overload Problem



Under heavy traffic,
router can randomly
mark packets
Both attack and normal
packets may be marked
Flow-based marking as
mentioned before is
much more efficient
FDPM Performance
PPM
DPM
FDPM
Packets for Traceback
103
102
102
Traceable Sources
102
103
105
Computational Load
High
Medium
Low
Overload Prevention
None
None
Good



FDPM can trace many more sources with less computational
load since it uses variable mark lengths
Incorporates overload prevention to keep router from failure
Requires more overhead (up to 25 bits instead of only 17)
HASH-BASED
TRACEBACK
Source Path Isolation
Engine (SPIE)
SPIE - Overview


Each router along a packet’s transmission path
computes a set of Hash-codes (digests) associated
with each packet
The time-tagged digests are stored in routermemory for some time period


Traceback is initiated only by “authenticated agent
requests” to the SPIE Traceback Manager (STM)


Limited by available router resources
Executed by means of a broadcast message
Results in the construction of a complete attack
graph within the STM
SPIE - Assumptions




Packets may be addressed to multiple destinations
Attackers are aware they are being traced
Routers may be subverted, but not often
Routing within the network may be unstable


Packet size should not grow as a result of traceback




1 byte increase in size = 1% increase in resource use
Very controversial … self-enabling assumption
End hosts may be resource constrained
Traceback is an infrequent operation


Traceback must deal with divergent paths
Broadcast messages can have a significant impact on internet
performance
Traceback should return entire path, not just source
SPIE - Architecture
DGA (Data Generation Agent)
Resident in SPIE-enhanced routers
to produce digests and store them
in time-stamped digest tables.
Implemented as software agents,
interface cards, or dedicated aux
boxes
STM (SPIE Traceback Manager)
Controls the SPIE system.
Verifies authenticity of a
traceback request, dispatches
the request to the appropriate
SCAR’s, gathers regional attack
graphs, and assembles the
complete attack graph.
SCAR (SPIE Collection and Reduction Agents)
Data concentration point for some regional area. When traceback is
requested, SCAR’s initiate a broadcast request for traceback and
produce regional attack graphs based upon data from constituent DGA’s
SPIE - Hashing

Multiple hash-codes (hash-codes,
different groupings of fields) are
calculated for each package
based on 24 bytes of relatively
invariant fields in the header plus
the first 8 byte of the payload.
Masked (gray) areas are NOT used in hash-code calculation


For each packet received, SPIE computes k
independent n-bit digests, and sets the
corresponding bits in the 2n bit digest table
[Bloom Filter] (nominally 256 Mb per filter).
Each filter contains the digests of multiple
packages (approximately 50M packets per
filter)… as large as possible, but avoiding
collisions.
SPIE – Hashing Collisions

The figure to the right shows
the fraction of packets that
collide as a function of prefix
length.


The WAN represents 985,150
packets between 6,031 host
pairs collected at the University
of Florida OC-3 gateway.
The LAN trace consists of one
million packets between 2,879
pairs at the MIT Lab for
Computer Science.
LAN
.139%
WAN
.00092%
SPIE – Hashing

When one Bloom Filter is “full”, the next one is initialized and
time tagged to record the receipt of the next packet. Can be
implemented as a circular buffer of filters.

For security purposes, each SPIE Agent generates a new set of k
input vectors to the Bloom Filters with each filter change



Based on a pseudo-random number generator independently seeded at
each router
These vectors are stored with the associated filter.
SPIE never needs to record any packet payload information

The first 8 bytes of the payload can be regenerated from the
hashing, given the stored input vectors
SPIE – Traceback Processing

The SPIE Traceback Manager controls the process

Cryptographically verifies that the authenticity and integrity of the
traceback request message





Authorized requester
Packet ID
Victim
Approximate time of attack
Dispaches the request to the appropriate SPIE Collection and
Reduction Agents (SCARs)

SCARs poll their assigned Data Collection Agents (DCAs)


If the response from the targeted SCARs indicates that other
regional SCARs are involved in the Trace, the STM sends
another direct request



DCAs poll their assigned routers
This loop continues until all branches terminate
Gathers the resulting attack graphs from the (SCARs)
Assembles them into a Complete Attack Graph
SPIE – Traceback Processing (Cont)

SPIE-enhanced routers hash the data received in the Traceback
Request to determine whether or not the target message passes
through the router


Computes k digests using the appropriate input vectors
Checks for a “1” in each of the corresponding K locations of the
digest table “near” the target time

If ALL associated bits are set, it is highly likely that the packet was
stored.

It is within the realm of possibilities that the Filter is saturated with an
overabundance of packets, creating a false positive.

This is controlled by limiting the number of digests in each filter,
depending upon Digest Table size and the mean volume of packet
traffic.
SPIE – Traceback Processing (Cont)
Reverse Path Flooding, starting
at the Victim’s router (V) and
proceeding backwards toward
the Attacker (A).
Solid arrows represent the
attack path.
Dashed arrows are SPIE
queries.
Queries are dropped by routers
that did not forward the packet
in question.
ATTACK PATH:
A
R2
R5
R7
R9
V
SPIE – Metrics
Network Throughput
0.5% of link bandwidth
"several orders of magnitude" less than other
methods
False Positive Rate
Directly scalable to the size of the digest table.
Computational Intensity
Hashing functions are simple …
non-cryptographic.
Trace initiation is regulated and assumed to be
rare.
Routers may be subverted and provide
incorrect data. But even subverted routers will
appear in Traceback Graphs.
Network Flooding
Susceptibility to Spoofing
Collisions (Hashing)
Scalable to size of digest table and bit-size of
hashing output
SPIE – Implementation Issues

PRO

Single packet tracing is feasible

Automated processing by SPIEenhanced routers make spoofing
difficult, at best

Relatively low storage required

Only digests and time are
stored

Does not aid in eavesdropping of
payload data

Payload is not stored

CON

Requires specially configured
(SPIE-enhanced) routers.


Storage in routers is a limiting
factor in the window of time in
which a packet may be
successfully traced


Probability of detection is directly
related to the number of available
SPIE-enhanced routers in the
network in question
May consider some sort of
filtering of packets to be digested
May have the appearance of a
loss of anonymity across the
Internet
Conclusions



DoS, worms, viruses continuously becoming
more dangerous
Attacks must be shut down quickly and be
traceable
Integrating traceback into next generation
Internet is critical
Conclusions

Flexible Deterministic Packet Marking



As fast as regular DPM, faster than PPM
Requires more overhead than DPM, but traces
more sources and less computational load
Hash-based Traceback


No packet overhead
New, more capable routers
Conclusions

Cooperation is required




Routers must be built to handle new tracing
protocols
ISPs must provide compliance with protocols
Internet is no longer anonymous
Some issues must still be solved


NATs
Collisions
References




Belenky, A., Ansari, N. “IP Traceback with
Deterministic Packet Marking”. IEEE
Communications Letter, April 2003.
Savage, S., et al. “Practical Network Support for IP
Traceback”. Department of Computer Science,
University of Washington.
Snoeren, A., Partridge, Craig, et al. “Single-Packet
IP Traceback”. IEEE/ACM Transactions on
Networking, December 2002.
Xiang, Y., Zhou, W. “A Defense System Against
DDoS Attacks by Large-Scale IP Traceback”, IEEE
2005.