Machine Learning for Network Anomaly Detection

Download Report

Transcript Machine Learning for Network Anomaly Detection

Machine Learning for Network
Anomaly Detection
Matt Mahoney
Network Anomaly Detection
• Network – Monitors traffic to protect
connected hosts
• Anomaly – Models normal behavior to
detect novel attacks (some false alarms)
• Detection – Was there an attack?
Host Based Methods
• Virus Scanners
• File System Integrity Checkers (Tripwire,
DERBI)
• Audit Logs
• System Call Monitoring – Self/Nonself
(Forrest)
Network Based Methods
• Firewalls
• Signature Detection (SNORT, Bro)
• Anomaly Detection (eBayes, NIDES,
ADAM, SPADE)
User Modeling
• Source address – unauthorized users of
authenticated services (telnet, ssh, pop3,
imap)
• Destination address – IP scans
• Destination port – port scans
Frequency Based Models
• Used by SPADE, ADAM, NIDES, eBayes,
etc.
• Anomaly score = 1/P(event)
• Event probabilities estimated by counting
Attacks on Public Services
PHF – exploits a CGI script bug on older
Apache web servers
GET /cgi-bin/phf?Qalias=x%0a/usr
/bin/ypcat%20passwd
Buffer Overflows
• 1988 Morris Worm – fingerd
• 2003 SQL Sapphire Worm
char buf[100];
gets(buf);
buf
0
Exploit code
100
stack
Return Address
TCP/IP Denial of Service Attacks
• Teardrop – overlapping IP fragments
• Ping of Death – IP fragments reassemble
to > 64K
• Dosnuke – urgent data in NetBIOS packet
• Land – identical source and destination
addresses
Protocol Modeling
• Attacks exploit bugs
• Bugs are most common in the least tested
code
• Most testing occurs after delivery
• Therefore unusual data is more likely to be
hostile
Protocol Models
• PHAD, NETAD – Packet Headers
(Ethernet, IP, TCP, UDP, ICMP)
• ALAD, LERAD – Client TCP application
payloads (HTTP, SMTP, FTP, …)
Time Based Models
• Training and test phases
• Values never seen in training are
suspicious
• Score = t/p = tn/r where
– t = time since last anomaly
– n = number of training examples
– r = number of allowed values
– p = r/n = fraction of values that are novel
Example tn/r
• Training: 0000111000
• Testing: 01223
– 0: no score
– 1: no score
– 2: tn/r = 6 x 10/2 = 30
– 2: tn/r = 1 x 10/2 = 5
– 3: tn/r = 1 x 10/2 = 5
n/r = 10/2
PHAD – Fixed Rules
• 34 packet header fields
– Ethernet (address, protocol)
– IP (TOS, TTL, fragmentation, addresses)
– TCP (options, flags, port numbers)
– UDP (port numbers, checksum)
– ICMP (type, code, checksum)
• Global model
LERAD – Learns conditional Rules
• Models inbound client TCP (addresses,
ports, flags, 8 words in payload)
• Learns conditional rules
If port = 80 then word1 = GET,
POST (n/r = 10000/2)
LERAD Rule Learning
Address
Hume
Marx
Marx
Port
80
80
25
Word1
GET
GET
HELO
Word2
/
/index.html
Pascal
• If word1 = GET then port = 80 (n/r = 2/1)
• word1 = GET, HELO (n/r = 3/2)
• If address = Marx then port = 80, 25 (n/r =
2/2)
LERAD Rule Learning
• Randomly pick rules based on matching
attributes
• Select nonoverlapping rules with high n/r
on a sample
• Train on full training set (new n/r)
• Discard rules that discover novel values in
last 10% of training (known false alarms)
DARPA/Lincoln Labs Evaluation
• 1 week of attack-free training data
• 2 weeks with 201 attacks
Internet
Router
Attacks
SunOS
Sniffer
Solaris
Linux
NT
Attacks out of 201 Detected
at 10 False Alarms per Day
140
120
100
80
60
40
20
0
PHAD
ALAD
LERAD
NETAD
Problems with Synthetic Traffic
• Attributes are too predictable: TTL, TOS,
TCP options, TCP window size, HTTP,
SMTP command formatting
• Too few sources: Client addresses, HTTP
user agents, ssh versions
• Too “clean”: no checksum errors,
fragmentation, garbage data in reserved
fields, malformed commands
Real Traffic is Less Predictable
r (Number of
values)
Real
Synthetic
Time
Mixed Traffic: Fewer Detections,
but More are Legitimate
140
120
Total
Legitimate
100
80
60
40
20
0
PHAD
ALAD
LERAD
NETAD
Project Status
• Philip K. Chan – Project Leader
• Gaurav Tandon – Applying LERAD to
system call arguments
• Rachna Vargiya – Application payload
tokenization
• Mohammad Arshad – Network traffic
outlier analysis by clustering
Further Reading
• Learning Nonstationary Models of Normal
Network Traffic for Detecting Novel Attacks
by Matthew V. Mahoney and Philip K.
Chan, Proc. KDD.
• Network Traffic Anomaly Detection Based
on Packet Bytes by Matthew V. Mahoney,
Proc. ACM-SAC.
• http://cs.fit.edu/~mmahoney/dist/