Using Data Mining to Develop Profiles to Anticipate

Download Report

Transcript Using Data Mining to Develop Profiles to Anticipate

Using Data Mining to Develop
Profiles to Anticipate Attacks
Systems and Software
Technology Conference (SSTC 2008)
May 1, 2008
Dr. Michael L. Martin
Uma Marques
MITRE
MITRE Standard Disclaimer
•
The author's affiliation with The
MITRE Corporation is provided for
identification purposes only, and is
not intended to convey or imply
MITRE's concurrence with, or
support for, the positions, opinions
or viewpoints expressed by the
author.
Where is the threat



Most of computer security money is
spent in prevention -- a bastion
mentality
Most of the loss is from insider activity
(82%)
Intrusion Detection is the art of
detecting and responding to computer
misuse
Intrusion Detection (ID)


Deterrence (we will find out what you
did and catch you)
Detection


Misuse detection based on known patterns
of attack (signatures)
Anomaly detection (profile of expected
behavior)


patterns of acceptable behavior
patterns of known misbehavior
Intrusion Detection
(continued)


Response
Damage Assessment


Attack Anticipation



need to assess in dollar terms
when (time of year/day; significant dates)
type
Prosecution Support (forensics)
Comparison of Network and Host
Based Intrusion Detection



Host Based
Patterns of File
Access
Patterns of
Application
Execution


Network Based
Analysis of Packets
and other network
activity
Sensor Placement & Firewalls


Sensor’s placed outside the Firewall
(sometime called the DMZ or demilitarized
zone) are useful for detecting the source
addresses attempting to attack and for attack
anticipation
Sensor’s placed inside the Firewall are useful
for detecting attacks that get through the
firewall and for unauthorized traffic going out
Threats to Normal Network Traffic
A
B
A: sender
B: receiver
A
B
A: sender
B: receiver
A: sender
B: receiver
C
Spoofing: C sends message to
B which B assumes is from A
A
B
A: sender
B: receiver
C
Interception: C obtains copy of
data intended for B only
B
B
A: sender
C
Normal Transmission
A
A
B: receiver
C
Modification: C intercepts and
changes data intended for B
Masquerade: C masquerades as
B so A thinks B received the
message.
A
B
A: sender
B: receiver
C
Monitoring: C learns about A and
B by analyzing traffic
Attacker Skills
Table 10-1
Skill Level
Clueless
Script Kiddie
Guru
Wizard
Hierarchy of Attacker Skills
Ability
Virtually no skills.
Able to find ready-made exploit scripts on the Internet and
run them following rote instructions. This code may give
them root access, activity-hiding capabilities, and back
doors for return visits. Unable to deal with non-standard
UNIX configurations.
Equivalent to an experienced systems administrator. Able
to manipulate UNIX systems that are not configured in the
standard way. Able to program in C, Perl, and shell script.
Check for existence of security programs and logging
performed off-system and
Intimate knowledge of UNIX internals. Capable of
programming in assembly language. Can manipulate
hardware and software. Very rare!
Evidence
All activities are readily apparent.
May attempt to cover tracks but with
limited success. Activities can be
detected with minimal effort.
Carefully clears out log files to remove
evidence of original compromise.
Leaves no obvious traces associated
with account used to access system.
May leave Trojan horses behind for
future access.
Leaves virtually no useful evidence on
the attached host
Attacker Chain of Attack
Attacker
No attack code
running on
original
workstation
Dial-In
Dial-up to stolen
ISP account. Might
use hacked phone
switch to confuse
trail.
Source
Originate attack from
stolen user account.
No root access
necessary. User
directory may contain
executables and
especially data files
related to system
attacks.
Proxy
One or more
intermediary hosts
used as cutouts to
confuse trail. Can
telnet in and telnet back
out of stolen account
without root access.
May use netcat or other
introduced executable
to set up a convenient
proxy.
Attack Target
Automated attack
software normally
requires root
privileges to access
network. Manual
attack may not need
root access.
Attack goal can include:
Hostile code can
include:
•Disabling host
•Sniffer
•IRC bot
•Game bot
•Denial-of-service
zombie
•Gaining logon access
•Manipulating a chat
room
•Special powers on
game server
Typical Attack Host Exploits
Table 10-2
Typical Attack Host Exploits
Attack Type Target
Sniffer
Any host with logon sessions
visible on the same LAN
segment. Outgoing sessions
to hosts on other networks are
desirable to hackers
Characteristic Evidence
Unauthorized daemon running
Unauthorized binary
Source code for unauthorized binary
Network adaptor in promiscuous mode.
Log file containing hostnames,
usernames, and passwords
Unauthorized daemon running
Unauthorized binary
Source code for unauthorized binary
Log or config files
IRC bot
Internet Relay Chat
(IRC) hosts
Game bot
Game servers
Unauthorized daemon running
Unauthorized binary
Source code for unauthorized binary
Log or config files
Distributed
denial-of
service
(DDoS)
Prominent Internet Web
servers
Zombie executable (unauthorized daemon)
Source code for unauthorized binary
Manual
Any hosts, local or remote
Source or binary code for attacks
Unusual outgoing connections
Lists of hostnames or IP addresses
(victims)
Password files or lists of
account/password pairs
Proactive Intrusion Detection


Security violations evolve in multiple states
Preliminary stages often not destructive



merely preparatory steps in the Attach Scenario
Goal is to Detect attack precursors, and take
immediate action
Preventing the resulting attach (Temporal
Data Mining)
Phases of Attack

Target Identification


Potential victim(s)
Experienced Crackers keep Long Lists of
Potential Victims; Sometimes willing to Share
Phases of Attack

Intelligence Gathering




Probe Systems to garner: Operating System and
Version, List of Network Services Provided
Use Password Sniffing & Guessing; and well
know compromises for Buggy Network Services
Most Vulnerability Scans are Heavy-Handed
(immediately visible to virtually any network
intrusion detection system IDS)
Patient & Skillful Attackers can circumvent IDS
Phases of Attack

Initial Compromise


Often Very Messy
Easy to find Evidence at this time




Unusual Number of Failed Logons
Log Records of Buffer Overflows and Undocumented
System Features
Core Dumps
Daemons Restarting
Phases of Attack

Privilege Escalation



Exploit Code to Compromise System
Exploit well-know vulnerabilities to Gain Root
Last Flurry of Incriminating Error Messages
Phases of Attack

Reconnaissance



First; Logging On? Second; Where are Logs
Stored?
Logs in Permanente form Best (printer, CD-R)
Forwarding Address of Root’s Email
Check Administrators’ home directory to see
what they have been up to
Phases of Attack

Reconnaissance




Looks for Security Programs
Looks for Open Files– what are currently running
programs doing?
Looks for File Integrity Programs (Tripwire, etc)
Systems Administrators (Name, System, etc)
Phases of Attack

Covering Tracks



Deleting Log! (Red Flag—”I’ve been attacked”)
Editing Log! (Remove their tracks only)
Log Editors for Binary Data (necessary for editing
binary log data; equivalent to burglar tools) ---utmp and wtmp are binary log files
Phases of Attack

Covering Tracks

Back Door


hidden copy of the command shell that is SUID (set
user ID) root (the file is owner by root, the SUID bit is
set, and the intruder has execute permission)
Hacked Binaries (modification or replacement of
standard system executables) also called altered
binaries, Trojan horses, hostile changelings, and
trojanizing
Conceptual Views of Misuse



An Unauthorized Individual Accesses
Data
An Unauthorized Individual Modifies
Data
Denial of Service
Acceptable versus
Unacceptable



If you had a Perfect Model of Acceptable
Behavior OR a Perfect Model of Unacceptable
Behavior it would be Easy
That is if you have defined all Acceptable
Behavior anything else is Unacceptable or
Misuse
Or is you have defined all Unacceptable
Behavior anything else is Acceptable
Acceptable Behavior Models




Usually based on historic data on ‘acceptable
behavior’
System is ‘trained’ on historical data
But if training data has unacceptable behavior
in it (that was missed) then unacceptable
behavior is allowed (false negative)
But if training data is missing data on
acceptable behavior (false positive)
Unacceptable Behavior Models




Define ‘all’ unacceptable behavior
A Priori rules based on ‘experts’
Catches most Significant Misuse
Misses much unacceptable behavior
(hard to define all unacceptable
behavior with certainty)
Detecting Hackers (outsider
misuse)

Attempts to gain ACCESS





Reading an Object (or file)
Writing an Object (or file)
Planting a TROJAN HORSE
Altering Systems Configuration
Achieving a FULLY Interactive Login
Detecting Hackers (outsider
misuse)

Denial of Service (DOS)





Deleting an Object (necessary part of
system)
Slowing Down a Network (flooding)
Stopping a Program (necessary part of
system)
Filling Storage Space (no work/file space)
Shutting Down a Critical Server
Weapons of Choice

Network Intrusion Detection



Attack Patterns Differ SIGNITIFICANTLY from
Normal Access
Attack Patterns Pronounced
Readily Identifiable


Because they EXPLOIT Know Vulnerabilities
Known Vulnerabilities HAVE Known Signatures
Weapons of Choice



Information Assurance Vulnerabilities
(IAVA)
Know Vulnerabilities & Harden System
Against
Web Sites for Information on
Vulnerabilities -- See SANS top 20)
 http://www.sans.org/top20/
Misuse Examples

Anomalous Outbound Traffic



Outbound Information Not Requested
Imbalance between Requested and
Provided
a sign that someone has gotten into your
system and is:


Attaching from it! (Distributed DOS)
Stealing Information!
Misuse Examples

Site being Swept





Range of Attacks AND
Range of IP Addresses
Done to MAP your Site
Done to Probe for Vulnerabilities
Solution: Proper Patches &
Configuration
Misuse Examples


Site being Swept (continued)
Information Flood

Unusually Large Traffic Volume




From a “Single” class of Service
From a “Single” IP Address
From Many IP Addresses
Solution - Block IP Address/Class of Service

Problem- Might Block Legitimate Connections
Misuse Examples

Unauthorized Access: Mission-Critical Data

Unauthorized Release (Privacy Violation)


Unauthorized Alteration



Sensitive Medical/Employee/Customer Information
Theft
Appraisals/Safety Reports/Work Reports/Customer
Records
Solution: Identify Mission-Critical Data &
Define Authorized Use
Behavioral Data Forensics In
Intrusion Detection





Data Mining to Identify Trends AND
Specific Activities that Indicate Misuse
Decision Support Capabilities of
Intrusion Detection
Find Out What Happened in a Network
of Live Computers
Error Detection and Eradication
Behavioral Data Forensics:
Benefits

Detect Insiders


Detect Outsiders (Hackers)


Identify Trends: Misuse & Suspicions
Activity
Identify Attack Trends to Harden Networks
Improve Policy


Fit Observed Versus Predicted Behavior
Identify Bad or Missing Policy
Data Mining



Means to Extract Unknown, Actionable Data
From Among Other Things Data Warehouses
Nontrivial Extraction of Implicit Previously
Unknown, & Potentially Useful Information from
Data
Process of discovering new correlations,
patterns, anomalies and trends by sifting
through large amounts of data
Data Mining



Pattern recognition technologies and
statistical and mathematical techniques
Tools often based on artificial
intelligence techniques
Processing Large Quantities of Data at a
Central Location Looking for “Patterns
of Interest”
Purpose of Data Mining



Complements predefined and ad hoc
access by enabling users to discover
new relationships
Improvement over a user's "gut feeling"
Bottom-up discovery data analysis, also
known as "knowledge discovery"
Data Mining & Intrusion Detection

MADAM ID



Constructs Intrusion Detection Signatures in
systematic and automated manner
Learns classifier that distinguish between
intrusions and normal activities
ADAM


Learns normal network behavior from attack-free
training data
Connection records of the last delta-seconds
continuously mined for new associations rules
Data Mining & Intrusion Detection

Clustering Unlabeled ID Data



normal elements with cluster together and
intrusive elements will cluster together
Biggest clusters are normal; smallest are
intrusive
Mining the Alarm Stream

Modeling normal and abnormal alarm
streams
Forms and Formats (Data
Types)





Raw TCP/IP Data (network event
capture)
Raw Binary Data (operating system
data)
ASCII Application Data (e.g., Syslog)
Detected Signatures (stored in RDBMS)
Behavioral Statistics (stored in RDBMS)
User-Centric versus TargetCentric

Target-centric



Database Optimized to Provide Target Data
Example: All Logins on a set of Target
Machines
User-centric


Database Optimized to provide User Data
Example: All Logins by User X on any
Target
Examples of Behavioral Data
Forensics

Security




Unauthorized Changes to Data (Price Lists)
Track Consultant Activities (Trust)
Administrators Browsing Personal Folders
(Abuse of Privilege)
Unauthorized User Logging into Backup
Account (if they encrypt your backup your
toast)
Examples of Behavioral Data
Forensics
Security Policy (monitoring for
compliance)



Policy Ignored (Locking Screensavers--time
to short)
Users Applying the Wrong Profile
Administrators Not Using Backup Accounts
for Backups (used admin account instead)
Data Mining Techniques

Data Presentation Refinement (change
view and tune parameters)


Tune Parameters until Interesting Features
Stand Out
Eliminate Common Occurrences to Zone in
on Rarer Interesting occurrences (needle in
a haystack)
Data Mining Techniques

Contextual Interpretations
(visualization, clustering, pattern
match)


Have a Detection Requirement in Mind
(predetermined interesting events)
Assign Context to Observed Trends
(knowledge discovery)
Data Mining Techniques

Drill Down (get to the root cause-underlying data causing the anomaly)

Focus on: Individual Time Frames (odd
hours, surge times), Specific Users (most
active, unusual hours, many privileges),
Specific Actions (Logons, Updates, Large
Transactions, Long Transactions), or
Targets (Data Servers, Main Servers,
Critical Mission Servers)
Data Mining Techniques

Combining Heterogeneous Data Sources


UNIX, Windows NT/2000, Mainframe
Incorporating Out-of-Band Data Sources

Interviews, Physical Logs, Coworkers
Data Mining Examples

Target Browsing


User Access Multiple Objects in Short Time
Frame
Critical File Browsing


Users Directory Hopping
High Activity
Data Mining Examples

Attack Anticipation (Tip-Off)


User Accessing Critical Files at Odd Times
(teller when bank is closed)
Target Overload (e.g., Server Overload)




Load Balancing Problem Causes Crash
Damage Assessment -- find loss and
document
Surveillance -- employee makes threats
Policy Compliance -- night logout
Summary

Behavioral Data Forensics






Studies Past Behavior in Event Records
Provides Decision Support Capabilities
Detects Hackers and Insider Misuse
Supports Damage Assessment AND
Attack Anticipation
Behavioral Data Forensics Facilitates

Business Process Reengineering
Contact Information






Dr Martin may be reached at:
Voice: 703-983-1093
Email: [email protected]
Uma Marques may be reached at:
Voice: 703-983-3783
Email: [email protected]
Sources




Kruse II, W.G., & Heiser, J.G. (2002). Computer
Forensics: Incident Response Essential, New York:
Addison-Wesley.
Proctor, P. E. (2001). The Practical Intrusion
Detection Handbook, Upper Saddle River, NJ:
Prentice Hall.
McClure, S., Scambray, J., & Kurtz G. (2001).
Hacking Exposed: Network Security Secrets &
Solutions (3rd ed ). New York: Osborne/McGraw-Hill.
Barbara, D., & Jajodia (2002). Applications of Data
Mining in Computer Security. Boston: Kluwer
Academic Publishers