ANOMALY DETECTION AND CHARACTERIZATION: LEARNING

Download Report

Transcript ANOMALY DETECTION AND CHARACTERIZATION: LEARNING

ANOMALY DETECTION
AND
CHARACTERIZATION:
LEARNING AND
EXPERIANCE
YAN CHEN – MATT MODAFF – AARON BEACH
NETWORK TRAFFIC:
WHAT DOES IT LOOK LIKE?
Where are the anomalies?
Overview
• Anomaly Detection using Prediction Algorithm
– Holt-Winters
• Basic:
– one dimensional detection (value prediction)
• Intermediate:
– multi-dimensional detection (vector prediction)
• Advanced:
– Characterization by correlating many
multi-dimensional detections in parallel
(2nd power vector prediction)
• Automatic characterization updates
using maliciousness rating system
Holt-Winters
• Prediction algorithm
– Exponential Smoothing
• Sum of three
components
– Baseline (intercept)
at   ( yt  ct m )  (1   )( at 1  bt 1 )
– Linear Trend (slope)
bt   (at  at 1 )  (1   )(bt 1 )
– Seasonal Trend
ct   ( yt  at )  (1   )(ct m )
Holt-Winters continued
• Constants alpha, beta, and gamma are
predetermined (between 0 and 1)
– Used 0.1 for all of them based on how much new
values should be weighted against old values
• Choose a seasonal size
– Choose 1 minute since
had 1 day
d   y we
 yˆ  (1only
  )d
– Or two hours for ICMP detection
t
t
t
t m
• Measuring within a threshold of
deviation (delta)
d t   yt  yˆ t  (1   )d t  m
( yˆ t    d t  m , yˆ t    d t  m )
Detecting Aberrations / Alarms
• Set a window size and the number of aberrations
considered alarming
• If there are more aberrations than the limit within
the time window, then alarm
• We used 10-15/30 and 1/1 aberration/window
size depending on the time step and the
characteristic nature of the variable
combination being detected
Network Traffic Data
• Network traffic data has many variables
• We look at:
–
–
–
–
Source and Destination IP addresses
Source and Destination port numbers
Protocol type
Bytes and packets in a traffic flow
• Unique flow defined by source and destination port/IP tuples
– Protocol flags (TCP flags)
Over time these many variables
form a dynamic vector of data
What is Anomaly Detection?
• We predict “normal” vector space using the
Holt-Winters Forecasting Method
• We define vector space beyond normal as “aberrant”
• If the network traffic vector travels into aberrant
space it is considered an “anomaly”
• Now lets look at a few examples of basic direct
anomaly detection and alarm triggering
Detection using port dimension
• A clear port scan on port 21 (FTP) at 12:46-47 AM
from one address outside the network
Detection using Protocol: ICMP
• ICMP spikes every 2 hours
• Without seasonal values all of these may show up
as malicious anomalies
Port activity: Malicious or normal
• While port 17300 is used by nothing except for the
Kuang2 Trojan/Virus, port 10000 is used for NDMP
server backup service and Dumaru.Y?
Detection using three variables:
Flow bytes/packets and TCP flag
• SYN attack early in the morning??
• What about the little spikes are they syn attacks?
Explaining detected anomalies
• Three variables is enough for detection but doesn’t tell us
what the anomaly is, we need other variables for
characterization
• Huge scan to port 4128, why just 4128 is it really just a DoS?
• All computers that that respond to the SYNs on 4128 receive
requests on port 137 (NET BIOS a protocol which is used to
support file and printer sharing)
• This data matches a method used to find exploitable systems
for many viruses. This is called a NBTSTAT -A type scan, which
is used to locate systems with open shares (port 4128) and
then they try to execute the infection via a connection to the
file share (port 137)
• An attack on port 137, however no large scan on port 137 only
a scan on a relatively harmless port 4128 this indirect
scanning could have avoided detection
• Possible suspects are: Nimda ,Bugbear, Msinit, Opaserv, Qaz
More Advanced Detection
• For the previous detection example we could
define a vector of malicious conditions
• The vector space would have had 10 variables
– 2 sets of (dst IP, dst port, bytes, packets, protocol)
– Each variable can have a condition or
range that is malicious
• This combination of 2 sets of 5 ranges or
conditions for different variables forms a unique
malicious vector space!
• Now lets look at an example of using three
detection vectors in parallel to distinguish
normal space from malicious space
Comparing 3 Detections in parallel
• Network seems to update
SMTP servers every few
hours, this should be taken
into account,
• Spikes in DNS traffic may be
credited to seasonal updates
• Due to some older SMTP
server’s authentication
protocol, port 113 traffic will
mirror SMTP traffic on a
smaller scale, if they are
taken together both spike at
the same relative ratio, this
can help distinguish normal
vector space for malicious
and help define the
conditions of malicious
characterization vectors
Detecting a Malicious Vector
• A degree of maliciousness at any one
moment can be calculated by finding the
percentage closer that the current traffic
is to malicious conditions than the
Normal/predicted values are.
• So any current network traffic vector
(point) has a degree of maliciousness for
each unique vector of malicious conditions
• 0% = completely normal/predicted
• >100% = completely within malicious space
Anomalous but not Malicious
• What if data falls outside of threshold of deviation
(out of normal space) but does not fall into malicious
space. Undefined space
• Any action taken in these cases is ignorant and not
based on previous knowledge so nothing should be
done, a warning alarm should go off and a careful
analysis and report of this data should be
stored so that it might be studies later
• If this anomaly leads into malicious space, the
malicious space may need to be expanded to include
this newly detected anomaly
Anomalous but not Malicious: continued
• Each non-malicious anomalous event should be
stored and given a manual malicious rating later
• This rating can then be incorporated into all
related malicious variable conditions
• The Detection conditions would then be continually
updated by new anomalous data simply by the
administrator rating how malicious a specific event
was to their network, and in which way it was
malicious (DoS, virus, etc) making updating done
very easy without relying on outer sources
Future Work / Implementation
• 3+ levels of detection
– Basic: checking maliciousness rating of one variable
– Intermediate: checking maliciousness of vectors of variables
– Advanced: checking vectors of maliciousness ratings of
multiple detection vectors in parallel
• This can continue to be scaled to whatever level of
complexity is necessary
• Each detection vector need only be checked once
every time step (seconds, minutes, etc…) depending
on how well server can perform. Detection precision
increases with smaller time steps only one time step
of data and vectors need be stored in memory
Future Work / Implementation
• Computations per time step is equal to the average
computation for one vector multiplied by the
number of detection vectors
• Memory requirement will be equal to traffic data
for one time step plus the average vector size
multiplied by the number of vectors
• Based on processor speed, memory space, and
number of characterizations being detected an
optimal time step could be computed
• Future work could involve testing the plausibility
of this system in high speed, large traffic volume
situation