Transcript ppt

Automated Worm
Fingerprinting
Authors: Sumeet Singh, Cristian Estan,
George Varghese and Stefan Savage
Publish: OSDI'04.
Presenter: YanYan Wang
Introduction



Recent large scale internet worm
post profound threat.
Traditional detection methods are
usually expensive and slow.
This paper investigate “Early bird”
method that automatically detect
and contain new worms on the
network using precise signature.
Existing Detecting Techniques

Scan detection




Example: code red.
Network telescope: passive network
monitors that observe large ranges of
unused, yet routable, address space.
Assumption: worms select target
victims at random
Limitations: not suited to non-random
spreading worms
Existing Detecting Techniques

Honeypots


Monitoring idel hosts with untreated
vulnerabilities
Limitations: requires significant amount
of slow manual analysis, depend on the
honeypot being quickly infected
Existing Detecting Techniques

Behavioral techniques at end hosts


Dynamically analyze the patterns of
system calls for anomalous activity.
Limitations: expensive, only detect
attack against a single host.
Characterization



Priori vulnerability signatures:
match known exploitable
vulnerabilities in deployed software.
Automation for signature extraction:
extracts the infected decoy
programs in a controlled
environment and identify invariant
code strings.
Autograph: (early bird)
Containment

To slow or stop the spread of an active
worm



Host quarantine: preventing an infect host
from communicating with other hosts
String matching: matches network traffic
against particular strings, or signatures
Connection throttling: limit rate of all outgoing
connection made by a machine, slow but not
stop
Worm Behavior

Content invariance



Program is identical across every host
it infects, though some has limited
polymorphism
Content prevalence: content not
prevalent is not useful for constructing
signatures
Address dispersion: the no. of infected
hosts will grow over time
Finding Worm Signature: Content
Sifting

For each network:




Extract content and process substring
Index each substring into a prevalence
table
Each table entry includes IP addresses
Sort the table
Finding Worm Signature: Content
Sifting

Huge memory consumption: Multistage filters
Finding Worm Signature: Content
Sifting

Address dispersion: trade precision
for dramatic reductions in memory
requirements

Example: For example, to count up to
64 sources using 32 bits, one might
hash sources into a space from 0 to 63
yet only set bits for values that hash
between 0 and 31 . thus ignoring half
of the sources.
Finding Worm Signature: Content
Sifting

Payload string requires significant
processing: value sampling


select only those substrings for which the
fingerprint matches a certain pattern.
Example: if f is the fraction of the tracked
substrings (e.g. f = 1=64 if we track the
substrings whose Rabin fingerprint ends on 6
0s), then the probability of detecting a worm
with a signature of length x is
Finding Worm Signature: Content
Sifting

If = 1=64 and = 40, the probability
of tracking a worm with a signature
of 100 bytes is 55%, but for a worm
with a signature of 200 bytes it
increases to 92%, and for 400 bytes
to 99.64%.
Practical Content Sifting: Early Bird
packet granularity
Early Bird

As each packet arrives, its content
(or substrings of its content) is
hashed and appended with the
protocol identifier and destination
port to produce a content hash
code.


32 bit cyclic redundancy check (CRC)
40 byte rabin fingerprints for substring
hashses
Early Bird

If the content hash is not found in
the dispersion table, it is indexed
into the content prevalence table.

4 independent hash functions creat
indexes into 4 counter arrays.
Early Bird
Practical Content Sifting: Early Bird
Prototype System : Early Bird



Sensor: sifts through traffic on
configurable address space “zones” of
responsibility and reports anomalous
signature.
Aggregator: coordinated real-time
updates from the sensors, coalesces
related signatures, activates any networklevel or host level blosing services and is
responsible for administrative reporting
and control.
Single threaded, excute at user-level, and
captures packets using libpcap library.
Prototype System
Early Bird
Early Bird
Early Bird
Early Bird
What’s the paper’s contribution?


A combination of existing and novel
algorithms for content sifting
Low memory and CPU requirements
What’s the paper’s weakness?

Depend on invariant content




Attackers can design variant content for worms
Attackers can evade by creating
metamorphic worms and traditional IDS
evasion techniques
Assume max growing time
Automated containment can be used
trigger a worm defense by attackers.
How to improve the paper?




Hybrid pattern matching: separate
non code string from potential
exploits
Investigate traffic normalization
Maintain triggering date across
multiple time scale
Develop efficient mechanisms for
comparing signature with existing
traffic corpus