Lecture24 - The University of Texas at Dallas
Download
Report
Transcript Lecture24 - The University of Texas at Dallas
Digital Forensics
Dr. Bhavani Thuraisingham
The University of Texas at Dallas
Application Forensics
November 5, 2008
Outline
Email Forensics
UTD work on Email worm detection - revisited
Mobile System Forensics
Note: Other Application/systems related forensics
- Database forensics, Network forensics (already
discussed)
Papers to discuss November 10, 2008 and November 17, 2008
Reference: Chapters 12 and 13 of text book
Optional paper to read:
- http://www.mindswap.org/papers/Trust.pdf
Email Forensics
Email Investigations
Client/Server roles
Email crimes and violations
Email servers
Email forensics tools
Email Investigations
Types of email investigations
- Emails have worms and viruses – suspicious emails
- Checking emails in a crime – homicide
Types of suspicious emails
- Phishing emails i- they are in HTML format and redirect to
-
suspicious web sites
Nigerian scam
Spoofing emails
Client/Server Roles
Client-Server architecture
Email servers runs the email server programs – example
Microsoft Exchange Server
Email runs the client program – example Outlook
Identitication/authntictaion is used for client to access the
server
Intranet/Internet email servers
- Intranet – local environment
Internet – public: example: yahoo, hotmail etc.
-
Email Crimes and Violations
Goal is to determine who is behind the crime such as who
sent the email
Steps to email forensics
Examine email message
- Copy email message – also forward email
- View and examine email header: tools available for
outlook and other email clients
- Examine additional files such as address books
Trace the message using various Internet tools
- Examine network logs (netflow analysis)
Note: UTD Netflow tools SCRUB are in SourceForge
-
-
Email Servers
Need to work with the network administrator on how to
retrieve messages from the server
Understand how the server records and handles the
messages
How are the email logs created and stored
How are deleted email messages handled by the server? Are
copies of the messages still kept?
Chapter 12 discussed email servers by UNIX, Microsoft,
Novell
Email Forensics Tools
Several tools for Outlook Express, Eudora Exchange, Lotus
notes
Tools for log analysis, recovering deleted emails,
Examples:
- AccessData FTK
- FINALeMAIL
- EDBXtract
- MailRecovery
Worm Detection: Introduction
-
What are worms?
Self-replicating program; Exploits software vulnerability on a victim;
Remotely infects other victims
Evil worms
Severe effect; Code Red epidemic cost $2.6 Billion
Goals of worm detection
Real-time detection
Issues
Substantial Volume of Identical Traffic, Random Probing
Methods for worm detection
Count number of sources/destinations; Count number of failed connection
attempts
Worm Types
Email worms, Instant Messaging worms, Internet worms, IRC worms, Filesharing Networks worms
Automatic signature generation possible
EarlyBird System (S. Singh -UCSD); Autograph (H. Ah-Kim - CMU)
Email Worm Detection using Data Mining
Task:
given some training instances of both
“normal” and “viral” emails,
induce a hypothesis to detect “viral” emails.
We used:
Naïve Bayes
SVM
Outgoing
Emails
The Model
Test data
Feature
extraction
Machine
Learning
Classifier
Training data
Clean or Infected ?
Assumptions
Features are based on outgoing emails.
Different users have different “normal” behaviour.
Analysis should be per-user basis.
Two groups of features
-
Per email (#of attachments, HTML in body,
text/binary attachments)
-
Per window (mean words in body, variable words
in subject)
Total of 24 features identified
Goal: Identify “normal” and “viral” emails based on
these features
Feature sets
-
-
Per email features
Binary valued Features
Presence of HTML; script tags/attributes; embedded
images; hyperlinks;
Presence of binary, text attachments; MIME types of file
attachments
Continuous-valued Features
Number of attachments; Number of words/characters in
the subject and body
Per window features
Number of emails sent; Number of unique email recipients;
Number of unique sender addresses; Average number of
words/characters per subject, body; average word length:;
Variance in number of words/characters per subject, body;
Variance in word length
Ratio of emails with attachments
Data Mining Approach
Clean/
Infected
Classifier
Test
instance
SVM
infected
?
Naïve Bayes
Clean/
Infected
Test instance
Clean
?
Clean
Data set
Collected from UC Berkeley.
-
Contains instances for both normal and viral emails.
Six worm types:
-
bagle.f, bubbleboy, mydoom.m,
mydoom.u, netsky.d, sobig.f
Originally Six sets of data:
-
training instances: normal (400) + five worms (5x200)
testing instances: normal (1200) + the sixth worm (200)
Problem: Not balanced, no cross validation reported
Solution: re-arrange the data and apply cross-validation
Our Implementation and Analysis
Implementation
-
Naïve Bayes: Assume “Normal” distribution of numeric and real
data; smoothing applied
-
SVM: with the parameter settings: one-class SVM with the radial basis
function using “gamma” = 0.015 and “nu” = 0.1.
Analysis
-
NB alone performs better than other techniques
-
The feature-based approach seems to be useful only when we have
SVM alone also performs better if parameters are set correctly
mydoom.m and VBS.Bubbleboy data set are not sufficient (very low detection
accuracy in all classifiers)
identified the relevant features
gathered enough training data
Implement classifiers with best parameter settings
Mobile Device/System Forensics
Mobile device forensics overview
Acquisition procedures
Summary
Mobile Device Forensics Overview
What is stored in cell phones
- Incoming/outgoing/missed calls
- Text messages
- Short messages
- Instant messaging logs
- Web pages
- Pictures
- Calendars
- Address books
- Music files
- Voice records
Mobile Phones
Multiple generations
- Analog, Digital personal communications, Third
generations (increased bandwidth and other features)
Digital networks
- CDMA, GSM, TDMA, - - Proprietary OSs
SIM Cards (Subscriber Identity Module)
- Identifies the subscriber to the network
Stores personal information, addresses books, etc.
PDAs (Personal digital assistant)
- Combines mobile phone and laptop technologies
-
Acquisition procedures
Mobile devices have volatile memory, so need to retrieve RAM
before losing power
Isolate device from incoming signals
Store the device in a special bag
- Need to carry out forensics in a special lab (e.g., SAIAL)
Examine the following
- Internal memory, SIM card, other external memory cards,
System server, also may need information from service
provider to determine location of the person who made
the call
-
Mobile Forensics Tools
Reads SIM Card files
Analyze file content (text messages etc.)
Recovers deleted messages
Manages PIN codes
Generates reports
Archives files with MD5, SHA-1 hash values
Exports data to files
Supports international character sets
Papers to discuss: November 10, 2008
FORZA – Digital forensics investigation framework that incorporate
legal issues
- http://dfrws.org/2006/proceedings/4-Ieong.pdf
A cyber forensics ontology: Creating a new approach to studying
cyber forensics
- http://dfrws.org/2006/proceedings/5-Brinson.pdf
Arriving at an anti-forensics consensus: Examining how to define
and control the anti-forensics problem
- http://dfrws.org/2006/proceedings/6-Harris.pdf
Papers to discuss November 17, 2008
Forensic feature extraction and cross-drive analysis
- http://dfrws.org/2006/proceedings/10-Garfinkel.pdf
md5bloom: Forensic file system hashing revisited
(OPTIONAL)
http://dfrws.org/2006/proceedings/11-Roussev.pdf
Identifying almost identical files using context triggered
piecewise hashing (OPTIONAL)
- http://dfrws.org/2006/proceedings/12-Kornblum.pdf
A correlation method for establishing provenance of timestamps in
digital evidence
- http://dfrws.org/2006/proceedings/13-%20Schatz.pdf
-