Security Architectures for the Distribution of Multimedia

Download Report

Transcript Security Architectures for the Distribution of Multimedia

Modeling the Spread of Worms
Wade Trappe
Overview

Quick discussion of how the Internet is organized.

Random Constant Spread (RCS) Model and Code-Red I
– The Differential Equation
– Solving it!
– Observations

Improvements in worm design
– Scanning Strategies
Internet Overview, pg.1



The Internet started as a research project connecting 4 computers in 1969, and has
grown to connect over 100 million machines.
The Internet is:
– A loose collection of networks organized into a hierarchy through
interconnection technologies.
– At the local level machines are connected to each other (local area network), and
to a router.
– A router is a special-purpose device that transfers data to and from the next layer
of the hierarchy.
Loose collection of networks organized into a multilevel hierarchy
– 10-100 machines connected to a hub or a router


–
–
–
–
–
service providers also provide direct dialup access
or over a wireless link
10s of routers on a department backbone
10s of department backbones connected to campus backbone
10s of campus backbones connected to regional service providers
100s of regional service providers connected by national backbone
10s of national backbones connected by international trunks
Internet Overview, Conceptual Picture
Internet Overview, pg. 2

Question: So, I want to send an email, how does it happen?

Answer: We use Addresses, and Route between Addresses using
the Internet Protocol (IP).

Your data is sent via packets, and the Internet employs a storeand-forward strategy when delivering them between nodes.

Packets consist of: Meta-data (header) and the data (payload)

Metadata allows us to forward packets when we want

E.g. letters at a post office headed for main post office
– address labels allow us to forward them in batches
Internet Overview, pg. 3








Internet addresses are called IP addresses
Refer to a host interface (device connecting the computer to the network):
need one IP address per interface
Addresses are structured as a two-part hierarchy
– network number
135.105.53
100
– host number
Question: How many bits to assign to host number and how many to
network number?
If many networks, each with a few hosts, then more bits to network number
And vice versa
In the end, IP addresses consist of three sets of partitions of bits
– class A: 8 bits network, 24 bits host
– class B: 16 bits each
– class C: 24 bits network, 8 bits host
Routing uses these addresses to deliver from a source to a destination.
Internet Overview, pg. 4

An example of a message route;
# traceroute henna.iitd.ernet.in
traceroute to henna.iitd.ernet.in (202.141.64.30), 30 hops max, 40 byte packets
1
UPSON2-NP.CIT.CORNELL.EDU (128.84.154.1)
1 ms
1 ms
2
HOL1-MSS.CIT.CORNELL.EDU (132.236.230.189)
3
CORE1-MSS.CIT.CORNELL.EDU (128.253.222.1)
4
CORNELLNET1.CIT.CORNELL.EDU (132.236.100.10)
4 ms
3 ms
4 ms
5
ny-ith-1-H1/0-T3.nysernet.net (169.130.61.9)
5 ms
5 ms
4 ms
6
ny-ith-2-F0/0.nysernet.net (169.130.60.2)
7
ny-pen-1-H3/0-T3.nysernet.net (169.130.1.121)
8
sl-pen-21-F6/0/0.sprintlink.net (144.228.60.21)
9
core4-hssi5-0.WestOrange.mci.net (206.157.77.105)
2 ms
2 ms
3 ms
4 ms
21 ms
21 ms
border7-fddi-0.WestOrange.mci.net (204.70.64.51)
12
vsnl-poone-512k.WestOrange.mci.net (204.70.71.90)
13
202.54.13.170 (202.54.13.170)
14
144.16.60.2 (144.16.60.2)
15
henna.iitd.ernet.in (202.141.64.30)
1349 ms
1380 ms
3 ms
19 ms
16 ms
11
1375 ms
2 ms
4 ms
core2.WestOrange.mci.net (204.70.4.185)
629 ms
2 ms
2 ms
10
628 ms
1 ms
16 ms
40 ms
20 ms
34 ms
20 ms
24 ms
26 ms
21 ms
21 ms
623 ms
21 ms
639 ms
628 ms
1343 ms
1405 ms
36 ms
1368 ms
621 ms
Now Back to Worms…

Someone who controls many nodes on the Internet can cause
serious damage to the Internet.
– It is reasonable to gain control of millions of Internet hosts
through worms.
– Worms differ from viruses in that worms do not require human
intervention to propagate. Viruses require user action (aka.
Clicking that email attachment).

Pandurang gave the overview of Worms, along with its history
in the previous lecture.

We will start with Code Red
Code Red

The Code Red Worm was initially released in July 2001.

The worm spread by compromising Microsoft web servers using
a vulnerability that had been discovered just a few weeks earlier.

Once a host was infected, Code Red would spread itself by
launching 99 threads, that each generated a random IP address
and tried to infect that address using the same vulnerability.

Initial version of Code Red, CRv1, had a bug in the random
number generator.

Second version of Code Red, CRv2, the bug was fixed. CRv2
contained a piece of code to perform a distributed denial of
service attack on www.whitehouse.gov.
Random Constant Spread, pg. 1

Code Red spread very rapidly at first, until almost all vulnerable machines were
compromised, then it seemed to slow down its spread.

The Random Constant Spread (RCS) is one model to describe this phenomenon.
Let N= total # of vulnerable servers which can be corrupted/infected (assume its
constant with time)
Let K= initial compromise rate
– i.e. the number of vulnerable machines that an infected host can find and
compromise at the start (when few other hosts have been compromised).
– K is some universal constant for a particular worm.





Assume that a compromised machine picks other machines at random, and that once a
machine is infected it cannot be compromised again.
Let T be point when half the machines are infected.
Variables:
– a: the proportion of vulnerable machines that have been infected (e.g. a=1 means
all N have been infected). The variable a will change with time t.
– t: time in hours
Random Constant Spread, pg. 2

RCS is based upon the idea of logistic growth:
– The actual growth rate at a time t depends on the population
– Suppose a(t) is the proportion of the N machines infected at time t,
then there are a total of Na(t) machines that have been infected.
– If we go from time t to time (t+dt), then a(t) will become
a(t+dt)=a(t) + da.
– da represents the change in the proportion a, and is an infinitesimal
quantity (i.e. everything is in the limit).
– So Nda represents the total number of additional machines that
will be infected in dt more time.
– That’s one way to calculate the number of additional machines that
can be infected in dt time, we need one more way.
Random Constant Spread, pg. 3

Key Idea: Suppose I have 100 machines and I can infect K of those machines
in one hour. Now, instead, suppose I have 80 machines, then how many can I
infect in one hour?
– Answer: 0.8 K

Now, suppose Na machines have been infected, then that leaves (1-a)N
machines left.
– Question: When I had N infectible-machines I could infect K machines.
So, now I have (1-a)N infectible machines, how many can one machine
infect?
– Answer: (1-a)K

Next Issue: I can infect (1-a)K machines in 1 hour, but what about in dt time?
Answer: (1-a)Kdt.
Final Issue: At time t I have a(t)N machines that can do the infecting, so how
many will be infected in time dt?
– Simple, but not completely accurate answer: (Na)K(1-a)dt

Random Constant Spread, pg. 4

Lets put the two sides together:
Nda= (Na)K(1-a) dt

So, how do we solve this? Answer: Its an easy first order diffeq.

One way:
da
 Kdt
a (1  a )
1 
1
 a 
 
da  Kdt  ln
  Kt  C
a
1

a
1

a




 a 
Kt  C

 a 1  e Kt  C  e Kt  C
e
1 a 

e Kt  C
 a (t) 
1  e Kt  C
1
e K ( t T )
a (T )   a ( t ) 
2
1  e K ( t T )

Random Constant Spread, pg. 5

Observations:
– For small t (before the first
infection) there is no growth, but
once the infection happens, growth
happens exponentially.
– However, once significantly past
T, growth slows again because we
are running out of machines to
infect.
– See plot for an example.

These observations were confirmed in
the real worm data.
– Several hours before Code Red
was due to terminate itself, it had
slowed down due to the fact it had
found the majority of infectible
machines.
Random Constant Spread, pg. 6

What was wrong with the RCS model?
– Basically, problem lies to the simplification of the probability
involved.
– The assumption that if aN machines are infected then (aN)(1-a)K
machines will be infected in next hour is wrong.



Randomly choosing an address might mean that you actually
try to reinfect an already infected machine.
Or, by randomly choosing an address, two infected machines
might try to infect the same machine.
Overall, the value of RCS is not its rigor, but the fact it reveals
underlying principles and dynamics.
Better Worm Strategies

Localized Scanning:
– It takes more time to infect a node further away than one nearby.
– Localized scanning seeks to balance the amount of attempts a worm takes
in infecting a nearby machine versus choosing a random machine on the
Internet.
– Strategy employed in Code Red II.

Hit List Scanning:
– We saw that worms take a while to get started, but once started they grow
exponentially. How do we speed up the start?
– Idea:




Give the initial worm a list of “high-potential targets”.
Once it infects a machine on the hit-list, it splits the hit-list in half and
gives half to child worm to use.
Child worms continue replicating and splitting hit list.
Advantages: hit-list shrinks quickly, initial spread is very quick.
Better Worm Strategies, pg. 2

Permutation Scanning:
– One limitation of random scanning is that different nodes may try to
infect the same machine, or infect an already infected machine.
– Idea:





Each worm gets a starting point of permutation space to work with.
Permutation Space is mapped to IP Address Space via a 32 bit cipher
(with fixed key).
The worm goes along attempting to infect each machine in its region
of permutation space. If it ever encounters a machine that has been
infected, it knows that its permutation space will start overlapping
another worm’s permutation space, so it chooses a new, random
place to start from in permutation space.
Result: Worms end up trying to work on separate sections of
permutation space.
Improvements: Enforced partitions of permutation space.