Transcript Slides

Use of Measurements in
Anomaly Detection
CS 8803: Network Measurements Seminar
Instructor: Constantinos Dovrolis
Fall 2003
Presenter: Buğra Gedik
Outline

We’ll be discussing 3 papers

Topic Detail: Inferring DoS Activity


Topic Detail: Code-Red Worm


Paper: D. Moore, G. M. Voelker, and S. Savage. Inferring
internet denial-of-service activity. In Proceedings of the
USENIX Annual Technical Conference (USENIX 2001).
Paper: D. Moore, C. Shanning, and J. Brown. Code-Red: A
Case Study on the Spread and Victims of an Internet
Worm. In Proceedings of the ACM Internet Measurement
Workshop (IMW 2002).
Topic Detail: DoS Attacks and Flash Crowds

Paper: J. Jung, B. Krishnamurthy, and M. Rabinovich. Flash
Crowds and Denial of Service Attacks: Characterization and
Implications for CDNs and Web Sites. In Proceedings of the
International World Wide Web Conference (WWW 2002).
Inferring Internet Denial-of-Service Activity
David Moore
Geoffrey M. Voelker
Stefan Savage
In Proceedings of the USENIX Annual
Technical Conference (USENIX 2001).
Problem Statement & Solution Overview

Problem:



How prevalent are denial-of-service attacks in
the Internet today?
This paper only considers flood type of attacks
Technique:

Use backscatter analysis for estimating the
worldwide prevalence of DoS attacks
Backscatter Analysis
Some Limiting Assumptions

Address uniformity: Attackers spoof
source addresses at random.

Reliable delivery: Attack traffic is delivered
reliably to the victim and backscatter is
delivered reliably to the monitor.

Backscatter hypothesis: Unsolicited
packets observed by the monitor
represent backscatter.
Address uniformity

May not hold because:



Some ISPs employ ingress filtering,
as a result the attacker may be
forced to restrict its address space
Reflector Attacks: A different kind of
flooding attack that is not captured
by backscattering, e.g. Smurf or
Fraggle attacks
The main motivation of the
assumption:


Many direct DoS attack “tools” use
random address spoofing, e.g.
Shaft, TFN, TFN2k, trinoo,
Stacheldraht, mstream, Trinity
It is possible to use tests like A2 to
test uniformity
Provider AS
ingress
filter
spoofed
packet
Customer AS
spoofed
packet
attacker
victim
packet spoofed
with victims IP
attacker
…
responses
Multicast
Group
Reliable delivery

May not hold because:




During the attack packets
may be dropped due to
congestion
IDS may filter the packets
Some type of attacks may not
produce a backscatter
Many attacks generate a
backscatter

Most type of flooding attacks
do generate a response
Backscatter hypothesis

May not hold because


Any host on the internet can send unsolicited
packets to the monitored network
Motivation of the assumption


Packets that are consistently targeted to a
specific address in the monitored network can
be filtered easily
Although a concerted effort by a third party
can bias the results, this is quite unlikely
Extrapolating Backscatter Analysis Results
Let n be the number of monitored IP
addresses
 And consider an attack with m packets


Then the expected number of backscatter
packets observed from the attack, E(X), is:
E(X) = (n*m)/232

Similarly, if the observed rate of an attack is
R’, than an upper bound on the real rate R, is:
R > R’ * 232 /n
Attack Classification

Two types of classification are done:

Flowed based classification


Used to classify individual attacks
Answering the questions:
 how many
 how long
 what kind

Event based classification

Analyze the severity of attacks on short time scales
Flow-based classification

A flow is defined as a series of consecutive
packets sharing the same target (victim’s
address) and same IP protocol

If no more packets are observed from a flow
for 5 minutes, the flow is assumed to end

All flows that do not have more than 100
packets or last less than 60secs are discarded

Flows that are only backscattered to a single IP
address in the monitored range are discarded
Examining the Flows

Determine the type of attack by examining



Look at the distributions of



TCP flag settings
ICMP packets
IP addresses, use A2 uniformity test to validate
the assumption, significance level of 0.05
port addresses
Classify the victim by examining


DNS information of the victim
AS level information of the victim from BGP
tables
Event-based Classification

An attack event is defined by a victim
emitting at least 10 backscatter packets
during a one minute period

Attacks are not classified based on type,
only criterion is the victim’s IP address

For each minute, the victims that are
under attack and the intensity of each
attack is determined and recorded
Experimental Setup

/8 network
represents 1/256
of the total
Internet

February 1st to
February 25th,
Ethernet traffic is
captured using a
shared hub with
the ingress router
Summary of Observed Attacks

5000 distinct victim IP addresses in more than
2000 distinct DNS domains
Attack/Response Protocols




~ 50% of the attacks generate TCP (RST ACK) suggesting they are TCP
flood attacks destined to closed ports
~ 15% of the attacks generate ICMP host unreachable containing a TCP
header including the victim’s IP again suggesting a TCP flood
~ 12% of the attacks generate ICMP (TTL Exceeded) Strange! These we
caused by attacks with very high rate and they correspond to around 50%
of all backscatter packets observed
~ 8% of the attacks generate TCP (SYN ACK) suggesting SYN floods
Attack Rate

Uniform Random Attacks are
the ones whose source IP
addresses satisfy the A2 test

500 SYN packets per second
are enough to overwhelm a
server (~40% of attacks
satisfy this)

14,000 SYN packets per
second are enough to
overwhelm a server with
specialized firewalls (~2.5%
of attacks satisfy this)
Attack Duration



50% of the attacks are less than 10 minutes
80% of the attacks are less than 30 minutes
90% of the attacks are less than 60 minutes
Victim Classification



Significant fraction of attacks targeted to home
machines, either dial-up or broadband
Within home users, cable-modem users have
experienced some intense attacks with rates going up to
1,000 packets per second.
Significant number of attacks to IRC servers
Victim Classification


No single AS or a small set
of ASs are major targets
65% of the victems were
attacked once and 18%
twice
Validation

98% of the packets attributed to
backscatter does not itself provoke a
response, so they can not be packets used
to probe the monitored network

98% of the victim IP addresses are also
encountered in other traces extracted
from different datasets collected at the
same period
Code-Red: A Case Study on the Spread and
Victims of an Internet Worm
David Moore
Colleen Shannon
Jeffery Brown
In Proceedings of the ACM Internet
Measurement Workshop (IMW 2002)
Analysis of the Code-Red Worm

Worms: Self replicating viruses

Code-Red worm classification




Code-RedI-v1: memory-resident, static seed, infect/spread/attack
Code-RedI-v2: memory-resident, random seed, infect/spread/attack
Code-RedII: disk-resident, intelligent, infect/backdoor/spread
Data Sets:

Packet header trace of hosts sending unsolicited TCP SYN packets
to a /8 (class A) network and two /16 networks, July 4 / August 21




July 12, 2001
July 19, 2001
August 4, 2001
-
Code-RedI-v1 set loose
Code-RedI-v2 set loose
Code-RedII set loose
Hosts that has sent at least two unsolicited TCP SYN packets (on
port 80) to the /8 network are suspected as infected hosts
Code-RedI Worms
Infected
running
Host host
running
MS
MS IIS
IIS HTTP
HTTP server
server
...
No MS
IIS running
No such
host
No such
host
Host Infected
DoS attack
From the beginning of 20th
to the end of the month
Randomly
generated IPs
Attack Phase
Bogus HTTP request
Bogus HTTP request
Bogus HTTP request
Bogus HTTP request
From the beginning to the
end of 19th of the month
Infection Phase
Bogus HTTP request containing the worm
Leverages a buffer overflow vulnerability in MS IIS HTTP server
www.whitehouse.gov
Unsolicited SYN probes, Code-Redv1




The trace includes large
number of probes to 23 IP
addresses within the
monitored /8 network
Using the same static seed
first 1 million IP addresses
are generated by reverse
engineering the worm code
Those 23 addresses in deed
appear in the generated
sequence
3 source addresses in the
trace do not belong to the
generated IP addresses, they
must be the initial hosts
infected manually



Atlanta, USA
Cambridge, USA
GuangDong, China
Host Infection Rate, Code-Redv2

More than 359,000
unique IP addresses
are infected with the
Code-RedI worm
within a day between
midnight of July 19
and July 20.
Deactivation rate for Code-Redv1




A clear time of day
effect is seen from the
figure
Many machines are shut
during the night
This is an indication that
many home and office
users are affected from
the virus
The worm is
programmed to switch
to its attack phase on
July 20, thus we have a
sudden increase in
deactivation rate at
midnight
Host Classification



Reverse DNS lookups are used to characterize the hosts
It is clear that a surprisingly large number of hosts are dial-up and
broadband users
Diurnal variations are observed, which suggests that a majority of the
infected hosts are not production web servers
Investigating time of day effect


Find location of hosts using IxMapping
(http://www.ipmapper.com) service
Convert UTC time to local time for each host and
plot active hosts as function of time
The Effect of DHCP





Between August 2
and August 16, 2
million infected
addresses are
observed
However only
143,000 hosts were
active in the most
active 10 minute
period
This can be
accounted to DHCP
DHCP inflates the infected host number
However NAT usage may deflate the number
Flash Crowds and Denial of Service Attacks:
Characterization and Implications for
CDNs and Web Sites
J. Jung
B. Krishnamurthy
M. Rabinovich
In Proceedings of the International World
Wide Web Conference (WWW 2002)
Definitions & Problem Statement

Definitions:



Flash Event (FE): A FE is a large surge in traffic to a
particular Web site causing dramatic increase in server
load and putting severe strain on the network links.
Denial of Service Attack (DoS) : A DoS is an explicit
attempt by attackers to prevent legitimate users of a
service from using that service.
Problem:


How to differentiate DoS attacks from Flash Events ?
How to improve CDN performance for handling FEs ?
Some Example DoS Attacks
TCP SYN Attack: spoofed SYN packets
 UDP Attacks: connect chargen-echo
 Ping of Death: oversized ICMP packets cause crash
 Smurf Attack: ping various hosts with victims address
 Fragile and Snork Attacks: echo and WinNT RPC
 Flooding Attack: flood network with useless packets


DDoS Attacks !!!
Example Flash Events

Popular Events, like



Elections
Olympics
Catastrophic events, like

Sept. 11

Popular Webcasts

Play-along Web Sites (for TV shows)
Dimensions of the Comparison

The comparison between DoS and FE is
done along the following dimensions:

Traffic Patterns

Client Characteristics

File Reference Characteristics
Flash Events

Datasets Studied

Play-along
Play-along web site for a populat TV show

Chile
The Chile Web site that hosted continuously
updated election results of 1999 election
Traffic Volume
• Request rate grows dramatically during the FE
• But the duration of the FE is relatively short
Traffic Volume
• Request rates increase rapidly during the initial period of the attack
• But the increase is far from instantaneous, enough room for adaptation
Characterizing Clients
• Number of clients in a FE is commensurate with the request rate
Characterizing Clients
• There is no clear increase in per-client request rates
Old and New clusters



Old clusters: clusters
that have been seen
before the FE
New clusters: clusters
that have been seen
during the FE but not
before
The percentage of old
clusters during the FE is
42.7% for Play-along
and 82.9% for Chile
Significant proportion of the clusters seen during the FE consists of
old clusters
• Request distribution over clusters is highly skewed
•
File Reference Characteristics



Over 60% of documents
are accessed only during
flash events
Less than 10% of
documents account for
more than 90% of the
requests
File reference distribution
is highly Zipf-like
DoS Attacks

Datasets studied:


esg and ol
Log files that recorded more than 1 million
requests within 60 days. A password cracking
attack is performed during this period.
bit.nl, creighton, fullnote, rellim, sptcccxus
Collection of 5 traces that recorded requests to
Web servers from machines infected by CodeRed worm.
Traffic Volume & Client Characteristics
(Code-Red)
• The surge occurred because of new clusters joining the attack
• For traces that contain both infected and non-infected client
requests, less than 14.3% of the clusters during the attack were
old clusters (even smaller for password cracking)
Client Characteristics (Code-Red)
• Request rates per client do not change during the attack
• Distribution of requests among clusters are more spread across
a number of clusters
Comparison of FE and DoS
?
Implications to CDNs





How we can handle FEs more effectively
using CDNs?
We have seen that most requests during a
FE are to documents that are not accessed
before the FE
This causes a lot of cache misses, which
overloads the origin server
One solution is to use cooperative caches,
but this introduces high delays
Authors propose an alternative approach
which does not incur a high delay yet
decrease load on the origin server
Illustration of the Problem
request obj
from several
CDN servers
CDN
Server
Origin
Server
cache miss
request doc
receive doc
CDN
Server
CDN
Server
CDN
DNS
Server
Client
Adaptive CDN