Slides - University of Virginia, Department of Computer Science

Download Report

Transcript Slides - University of Virginia, Department of Computer Science

Network-level Malware Detection
Mike McNett, Matthew Spear, Richard Barnes
CS-851 – Malware
23 October 2004
1
Outline


Introduction: Design of a System for Real-Time Worm
Detection
Example 1: Detecting Early Worm Propagation through Packet
Matching (DEWP)

Example 2: Fast Detection of Scanning Worm Infections

Example Application: Therminator

Conclusions
2
Introduction
Questions Being Considered:
1. Why network level detection?
2. What are the alternatives?
3. Are there reasonable solutions?
4. What are the limitations, advantages,
disadvantages compared to the alternatives?
3
Introduction
1. Malware Detection Options?
a) Prevention vs. Treatment
b) Signature vs. Anomaly
c) Host-based containment
d) Network containment
e) Packet Header vs. Packet Payload
2. What are the advantages, disadvantages,
and limitations of the above?
4
Network-level Detection
5
Design of a System for Real-Time Worm
Detection
Hash
 Count Vector
 Character Filter
 SRAM Analyzer
 Alert Generator
 Periodic Subtraction of
Time Averages

6
Design of a System for Real-Time Worm
Detection
Scalable to high throughput
 Solution depends on specialized hardware
 Low false positive rate

What are the problems?
 What are the advantages?
 Are there other, more simplistic signatures?
 Can similar attacks be detected at the host level?

7
Detecting Early Worm Propagation
through Packet Matching
Xuan Chen and John Heidemann
ISI-TR-2004-585
February 2004
8
DEWP

Router-based system:



automatically detects and quarantines Internet worm
propagation
matches destination port numbers between incoming and
outgoing connections (automated signature creation)
detects and suppresses worms due to unusual traffic
patterns

detects worm propagation within about 4 seconds

protects > 99% hosts from random-scanning worms
9
DEWP Thesis

Matches destination port numbers between
incoming and outgoing connections. Two
observations on worm traffic:


Worms usually exploit vulnerabilities related to specific
network port numbers
Infected hosts will probe other vulnerable hosts exploiting
the same vulnerability
So… high levels of bi-directional probing traffic with
the same destination port number  new worm
 Scalable: Matching destination port numbers
consumes low computational power

10
DEWP




Two components of DEWP: worm detector and packet filter
Two step detection: destination port matching and
destination address counting
Uses packet filtering to suppress worm spreading
Can deploy at different levels of network
11
Worm Containment


DEWP uses traffic filtering – routers drop packets
with the automatically discovered destination port
Worm containment: protect internal hosts from
internal and external threats; notify other networks
about attacks
12
Design


Maintains one port-list for each direction (incoming and outgoing):
records number of connections to different destination ports
Timer for each entry in port-lists:






If port has not been accessed for certain time interval, reset corresponding
list entry
Monitor outgoing destination addresses of non-zero entries in both port-lists
Every T seconds, check number of unique addresses observed within last
time interval. Worm traffic detected with the following condition:
N is the number of unique addresses observed.
Long-term average:
 is the system sensitivity to changes
13
Effectiveness of Worm Detection and
Quarantine

Random scanning worm: detects worm traffic in 4.8 seconds when fully deployed
with a 1 second detection interval.

Always detects worm probing traffic in 4-5 seconds when deployed to different
layers.

Number of infected hosts in the protected network – primarily determined by the
number of probing packets received from outside

Can protect almost all hosts from infection when only deployed on the access
router.
14
Local Scanning

Local scanning: Can detect worm probing traffic in 3.87 seconds. But, almost
all vulnerable hosts in the protected network are compromised

Deployment has little impact on either detection delay or infection
percentage.

The infection percentage increases as number DEWP deployed layers are reduced:
When only on the access router  all vulnerable hosts compromised within 10
seconds

More frequent detection reduces vulnerability to local-scanning worms

DEWP quickly detects worm attacks regardless probing techniques.

With full deployment about 9% vulnerable hosts compromised in the protected
network

Due to difficulty to effectively quarantine local-scanning worms  a very small
detection interval and wide deployment is critical to protect vulnerable hosts
15
Effect of Detection Intervals





Address-counting with an interval of T seconds.
Different detection intervals affect detection delay and infection percentage
Random-scanning worm. Detection delay and the number of infected hosts
increases with detection intervals.
Local-scanning worms: 1) No significant difference in detection delay; 2)
Infection percentage increases dramatically at larger intervals:
So, automatic system needs to react to worm traffic within small time
intervals
16
False Detections




No false positives
Discovered ~10 suspicious destination ports including 21 (FTP), 53 (DNS),
and 80 (Web)
Depends on address-counting to reduce false positives
Worm scan rate C affects false negatives: when worm scan at low rate,
probing traffic has less effect on overall traffic. DEWP routers have more
difficulty distinguishing them from normal traffic.

With C = 500  worm traffic stands out compared to regular traffic

DEWP is not able to detect worms with scanning rate lower than C = 25.
17
Conclusions

Detects and quarantines propagation of Internet worms

Uses port-matching and address-counting as the signature.

Detects worm attack within 4-5 seconds



By automatically blocking worm traffic, it protects most
vulnerable hosts from random-scanning worms.
Authors believe that an automatic worm detection and
containment system should be widely deployed and have
very small detection intervals
Not realistic to deploy DEWP on all routers – for random
scanning worms – sufficient to put on access router.
18
Worm Detection
Fast Detection of Scanning
Worm Infection
19
Detection Techniques
1. Reverse Sequential Hypothesis Testing (TH)
•
Detects worms based upon number of failed connection
attempts
•
Uses probability to determine if a local host is scanning
•
Designed to be tied into a containment system
2. Signature Based Analysis (Early Bird System (EBS))
•
Detects worms based upon Rabin signatures of content/port
•
Used in conjunction with a containment system
20
Definitions
l
Local Host
d
Destination Address
First contact connection
FCC
Yi
H0
Indicator variable of ith FCC
Hypothesis that l
is not scanning
H1
Hypothesis that l
is scanning
θk
η0
η1
Pr(Yi = 0 | Hk) k є {0,1}
Upper bound to accept H1
Lower bound to accept H0
21
Definitions
n
 (Yn )   (Yn)  (Yn  1) (Yn)
i 1
 (Yn ) max( 1, (Yn  1) (Yn ))
PD
PF
Probability of detecting an infected host
Probability of host as infected when it is not
α
β
Upper bound on PF (α ≥ PF)
Lower bound on PD (β ≤ PD)
Cl
Credits for l
22
Basic Algorithm
 Maintain separate state information for each host (l) being
monitored ( (Yn ) ), the hosts that have been previously contacted,
and an FCC queue (FCCQ) of first contact attempts that have been
attempted but have not been recorded in the observation (PCH).
1. When a packet is observed check to see if d is in the PCH of l, if
not then add d  PCH and add the attempt to FCCQ as PENDING.
2. When an incoming packet is sent to l and the source address
exists in FCCQ update the record to SUCCESS in the FCCQ unless
the packet is a TCP RST.
3. When the head entry of FCCQ has status of PENDING and has been
in queue for longer than a predefined time limit set its status to
FAILURE.
1. If the entry at the head of FCCQ has status other than PENDING
23
update  (Yn ) and compare it to η1
Basic Algorithm
Credit Based Connection Rate Limiting
(CBCRL)
•
Simple scheme to limit the amount of connections l can make in a
given slot of time by allotting each l a set number of credits (Cl)
that is modified given events.
Event
Change to Cl
Cl  10
Initial
FCC issued by l Cl  Cl - 1
Cl  Cl + 2
FCC succeeds
2
Cl  max( 10, Cl )
if Cl > 10
Every second
3
Cl  1 if Cl = 0 for 4 seconds
Allowance
•
Used in conjunction with TH to limit number of connections a host
24
can make allowing TH time to determine if a host is infected.
Experiment
•
•
Conducted two experiments in 2003 (isp-2003) and 2004 (isp-04).
Worms identified via comparing traffic to known worm descriptions.
Date
Duration
Total Outbound connection
attempts
Total Active Local hosts
θ0
θ1
α
β
isp-03
isp-04
APR 10 2003 JAN 28 2004
627 min
66 min
1402178
178518
404
.7
451
.7
.1
5E-5
.99
.1
5E-5
.99
25
Results
isp-03
isp-04
Worms Detected (Total)
CodeRed II
2
0
Blaster
MyDoom
Minmail.j
HTTP (other)
0
0
0
3
1
3
1
1
False Alarms (Total)
HTTP
SMTP
5
0
0
0
P2P Detected (Total)
Total Identified
TH
6
6
3
3
6
11
11
23
Alarms
Detection
Efficiency
Effectiveness
34
11
.324
.917
26
Limitations, Future Work?
Are there any serious flaws in this algorithm?
• Warhol type scanning
• Network outages can cause TH to decide that a host is a worm
• Worms could conceivably collaborate to defy detection
• Worms could remember hosts that it can contact and defy
detection through them
• Spoofing attack to get an uninfected host blocked
• Interleave scanning with benign activities (i.e. for every scan visit
a website that is known to be running)
Future work?
•
Can trivially modify to work with the containment strategies
discussed earlier
27
THERMINATOR!!!
Science comes to the aid of network-level anomaly
detection
28
Network behavior is complicated

How do we use “microscopic” packet-level
data to make “macro” network-level
decisions?




Too broad, e.g. keeping track of global traffic
patterns.
Too refined, e.g. looking at individual packets.
Hmm… who else tries to make sense of the
overall behavior of millions of single objects?
Physicists and Chemists!
29
Idea



Given a computer network with >1000
nodes,
Want to detect anomalous traffic, without
any foreknowledge.
Idea of THERMINATOR


Take advantage of lots of packet-level data.
Use physical techniques to distill information into
relevant statistics: Temperature, entropy, etc.
30
Data Reduction
1. Take the set of hosts and group them into
“buckets” or “conversation groups”.
2. Observe communication among buckets.
3. Calculate physical statistics based on these
higher-level communications.
 By virtue of the mathematics, these are
guaranteed to be the same as if we’d just
looked at hosts.
31
Physical Network Visualization

Based on reduced data,
we know pseudophysical statistics:





Bucket size
Temperature
Entropy
Heating rate
Work rate
Image courtesy of DISA

Visualizing these data
shows network events.
32
Network Event Detection
33
THERMINATOR Implementation




Jointly developed by DISA, NSA, and
Lancope Inc.
Uses Lancope’s data-collection hardware to
provide data to THERMINATOR.
THERMINATOR reduces data, computes
stats, and provides visualization.
“Research tests validated that
THERMINATOR detected anomalies that
the intrusion detection systems did not
capture.” -- NSA
34
Conclusion






Combined approaches (host-based, network-based,
visualization)?
Can signatures be automatically generated?
Can attacks be visualized?
Potential impacts of false positives (is the medicine
worse than the sickness) and automated
containment?
Need different solutions for local-scanning vs. nonlocal scanning worms?
Are there other scientific areas that malware
research can leverage?
35