Transcript Document

Ranking Attackers Through Network
Traffic Analysis
Andrew Williams & Nikunj Kela
11/28/2011
CSC/ECE 774
1
Agenda
•
•
•
•
•
Background
Tools We've Developed
Our Approach
Results
Future Work
11/28/2011
CSC/ECE 774
2
Background: The Problem
Setting 1: Corporate Environment
•Large number of attackers
•How do you prioritize which attacks to investigate?
RSA
11/28/2011
CSC/ECE 774
3
Background: The Problem
Setting 2: Hacking Competitions
•How do you know who should win?
11/28/2011
CSC/ECE 774
4
Background: Information Available
• Network Traffic Captures
• Alerts from Intrusion Detection Systems (IDS)
• Application and Operating System Logs
11/28/2011
CSC/ECE 774
5
Background: Traffic Captures
• HUGE volumes of data
• A complete history of interactions between clients and
servers*
Information available:
• Traffic Statistics
• Info on interactions across multiple servers
• How traffic varies with time
• Everything up to and including application layer info**
11/28/2011
CSC/ECE 774
6
Background: IDS Alerts
• Messages indicating that a packet matches the signature of
a known malicious one
• Still a fairly large amount of data
• Same downsides as anti-virus programs, but most IDS
signatures are open source!
• If IDS is compromised, these might not be available
Information available:
• Indication that known attacks are being launched
• Alert Statistics
• How alerts vary with time
11/28/2011
CSC/ECE 774
7
Background: Application/OS Logs
Ex: mysql logs, apache logs, Windows 7 Security logs, ...
• Detailed, application-specific error messages and warnings
• Large amount of data
• If a server is compromised, logs may not be available
Information available:
• Very detailed information with more context
• Access to errors/issues even if traffic was encrypted
11/28/2011
CSC/ECE 774
8
Background: iCTF 2010 Contest
• 72 teams attempting to compromise 10 servers
• Vulnerabilities include SQL Injection, exploitable off-by-one
errors, format string exploits, and several others*
• Pretty complex set of rules
Dataset from competition:
• 27 GB of Network Traffic Captures
• 46 MB of Snort Alerts (from competition)
• 175 MB of Snort Alerts (generated with updated rulesets)
• No Application or OS Logs
More information on the contest can be found here:
http://www.cs.ucsb.edu/~gianluca/papers/ctf-acsac2011.pdf
11/28/2011
CSC/ECE 774
9
Tools We've Developed
We wrote scripts to...
• Parse the large amount of data:
o Extract network traffic between multiple parties
o Filter out less important Snort Alerts
o Track connection state to generate statistics and stream data
• Visualize the data
o Show all of the alerts and flag submissions with respect to
time
• Analyze the data
o Pull out the transaction distances and find statistics on them
• Generate Application and OS Logs
o Replay network traffic to live virtual machine images
11/28/2011
CSC/ECE 774
10
Our Approach: Intuition
• Vulnerability Discovery Phase
Identify the type of vulnerability
• Vulnerability Exploitation Phase
Refine the attack string
• It is quite intuitive that a skilled attacker will come up with
the attack-string in less time than an unskilled attacker
• How do we know if the attacker has broken into the system?
We only have logs to work with!
• Time taken to break into the system reflects the learning
capabilities of an attacker
Fast learner implies good attacker
11/28/2011
CSC/ECE 774
11
Our Approach: Identify the attack string
• Once the attacker break into the system, he/she would use
the same attack string almost every time to gather information
• We observed from the traffic logs that in most of the cases,
the attacker used one TCP stream to break into the system
One TCP connection for each attempt!
• We chose Levenshtein distance (Edit Distance) as our metric
to compare the two TCP communication from attacker to
server
• Consecutive zero as the distance between TCP data means
the attacker has successfully broken into the system
11/28/2011
CSC/ECE 774
12
Example: Identify the attack string
Stream1: "%27%20or%20%27%27%3D%27%0Alist%0A"
Stream2: "%27%20OR%20%27%27%3D%27%0ALIST%0A"
Stream3: "asdfasd%20%27%20UNION%20SELECT%20%28%27secret.txt%27%29
%3B%20--%20%20%0AMUGSHOT%0ASADF%0A"
Stream4: "asdfasd%20%27%20UNION%20SELECT%20%28%27secret.txt%27%29
%3B%20--%20%20%0AMUGSHOT%0A39393%0A"
Stream5: "asdfasd%20%27%20UNION%20SELECT%20%28%27secret.txt%27%29
%3B%20--%20%20%0AMUGSHOT%0A1606%0A"
S
Stream6: "asdfasd%20%27%20UNION%20SELECT%20%28%27secret.txt%27%29
%3B%20--%20%20%0AMUGSHOT%0A1606%0A"
11/28/2011
CSC/ECE 774
13
Our Approach: Features Selection
• Time taken to successfully break into the system
• Mean and standard deviation of the distances between
consecutive TCP streams
• Number of attempts before successfully breach into the
service
• Length of the largest sequence of consecutive zero's
11/28/2011
CSC/ECE 774
14
Result: Distance-Time Plot
11/28/2011
CSC/ECE 774
15
Interesting Findings from the contest
• Although the contest involved only attacking the vulnerable
services, yet the teams tried to break into each others systems
• We noticed that teams shared the Flag value with each other
through the chat server
• The active status of the service was maintained through a complex
petri-net system and most of the teams struggled to understand it
• Hints about different vulnerabilities in the services were released
time to time through out the contest by the administrators
11/28/2011
CSC/ECE 774
16
Future Work
• Use of data mining tools(e.g. SAS miner) to analyse the
relationships among the features
• Use of data mining tools for developing a scoring systems to
give scores to each teams based on the feature set
• Continue improving the replay script to handle the large
number of connections
11/28/2011
CSC/ECE 774
17
Thank You!
Questions?
11/28/2011
CSC/ECE 774
18
Image Sources
WooThemes, free for
commercial use
Icons-Land, free for
non-commercial use
Fast Icon Studio,
used with permission
11/28/2011
CSC/ECE 774
19