Transcript ppt

Characteristics of Internet
Background Radiation
Authors: Ruomind Pang, Vinod Yegneswaran, Paul
Bartfod, Vern Paxson, Larry Peterson
Appeared in IMC 2004, Taormina, Sicily, Italy, October
2004
Presenter: Charles Ahern
Introduction
Older (mid 90’s) internet traffic studies
make no mention of an appreciable
amount of on-going nonproductive traffic
Today, this traffic, either malicious or
benign (misconfigurations) is prevalent
The goal of this paper is to categorize this
traffic, determine where it comes from and
what it is doing
Outline
 The magnitude of the problem
 How to decide what traffic is “nonproductive”
 Determining the nature of the traffic
Filtering
Responding (to gain further insight)
 Brief Experiment Details
 Quantifying & Qualifying
 Weaknesses & Contributions
Magnitude
The magnitude of nonproductive traffic on
the internet is not minor
Example:
Traffic logs from Lawrence Berkeley Laboratory
(LBL) for an arbitrary day show:
138 different remote hosts each scanned 25,000 or
more LBL addresses for a total of over 8 million
connection attempts
This is more than DOUBLE the site’s entire
successfully-established incoming connections,
originated by 47,000 distinct remote hosts
Given the traffic’s pervasive nature, they
have termed it “Internet radiation”
Determining What is Unwanted
If we include all unsuccessful connection
attempts, this will be an inaccurate statistic
Transient failures
Instead, measure traffic sent to hosts that
don’t exist
Likely to eliminate most transient failures and
yield unwanted activity
You can safely respond to this traffic
Taming the large Traffic Volume
Listening to traffic on thousands to millions
of IP addresses… MUST handle efficiently
Nearly 30,000 packets per second of
background radiation on the Class A
network they are monitoring
Filtering schemes must be sound and
effective
Filtering
Source-Connection Filtering
Keep first N initiated by the source
Disadvantages:
Inconsistent view of the network
N value is attack and service dependant
Source-Port Filtering
Keep first N connections for each
source/destination port pair
Allows wider variety of activities
Still same downsides though
Filtering
Source-Payload Filtering
One instance of each type of activity per source
Good idea, hard to sometimes implement
Hard to tell if two activities are similar until several
packets are responded to
Source-Destination Filtering (their choice)
Assume one source will try the same activities
on every IP it tries to connect to
Filter
Effectiveness
Responders
 Highly efficient responder network
 Found that most radiation is TCP SYN packets,
which means they must respond
 Approach to building responders was “data
driven”: the determined which responders to
build based on traffic volumes
Pick the most common form, build a responder
Once the traffic could be differentiated into specific types
of activity, repeat with the next largest type of traffic
Responders Created
HTTP (port 80)
NetBIOS (port 137/139)
CIFS/SMB (port 139/445)
DCE/RPC (port 135/1025)
Dameware (port 6129)
MyDoom (port 3127)
Beagle (port 2745)
Responders
 Responders need to stick to the protocol (“how”
to say it)
 They also need to know “what” to say to keep
communication going
 Differences in connections can be difficult to
determine at the network or transport level,
leading to needing an application level
understanding required
 Responses are developed manually, and many
are intricate and take research to determine their
format
Brief Experiment Details
Two separate network sites with two
different systems iSink and LBL Sink.
Each system performed the same
responses but used different underlying
mechanisms
iSink
Class A network 224 addresses
And 2 /19 subnets (16k addresses) on two
adjacent UW campus class B networks
One filter for each network
Filtered requests passed to the iSink
Did both passive (no responders) and
active measurements
iSink Setup
LBL Sink
Two sets of 10 contiguous /24 subnets
First is passive and unfiltered
Active analysis is divided into two sets of 5
subnets and filtered
All traffic then tunneled to a Honeyd
responder
LBL Setup
Summary of Data Collection
Quantifying
Traffic rate breakdown by protocol
(rate is number of packets per
destination IP per day)
Traffic breakdown by # of sources
Qualifying
Activities are ranked by number of source
IP’s, not by byte or packet volume
Their filtering algorithm is biased to a source IP
that tries to reach too many destinations
The number of source IP’s reflects the
popularity of the activity across the internet
Single-source activities might be eccentric,
while multi-source activity is more likely to be
intentional
Qualifying
To qualify activities, all connections
between a source-destination pair on a
given port are looked at
Only common ports are considered
What about uncommon ports???
Ports
Background radiation traffic is highly
concentrated on popular ports.
Example, on Mar 29, they saw 32,072
distinct source IP’s at LBL and only 0.5%
of the source hosts contacted a port not
among “popular” ports they monitored
Thus by only looking at popular ports,
most internet radiation is monitored
Qualifying
Weaknesses
 IP addresses were heavily used in filtering and
statistical analysis. Because DHCP servers can
assign different IP addresses, this can flaw the
data
 Many attacks must be known beforehand so that
they can build responders
 A new worm might be propagating heavily for
the short period of time during their tests which
would skew typically observed numbers
 Heavier weights put on “more popular” attacks
due to IP filtering, however “less popular” attacks
may generate much more traffic
Contributions
Were able to quantify how much typical
internet traffic is nonproductive
Were able to qualify this nonproductive
traffic into categories and show much of it
is malicious