Transcript Lecture22
A summary of the report written by W. Alink, R.A.F.
Bhoedjang, P.A. Boncz, and A.P. de Vries.
The Problem
Large amount of data – possibly terabytes
Limited amount of time
Higher chance of missing traces
Diversity of data
Too many specialized tools
Difficult to integrate results
Time constraints
Knowledge constraints
Solution
Separate feature extraction from analysis:
Feature Extraction: The extraction of useful features
from raw data- Includes more than just file data
Analysis: Browsing, querying and correlating.
One output format for forensic analysis tools (based on
XML)
XML for storing and querying the output of the tools.
Automate feature extraction
Various current projects in law enforcement community
related to automated feature extraction
XIRAF
Prototype system that uses this approach
“XML Information Retrieval Approach to digital
Forensics”
Automatic feature extraction from disk image/s
Stores data in XML database
Uses XQuery (XML query language) to access the
database and the data from the disk-image.
Framework
3 components:
Tool repository: feature extraction tools
Feature extraction manager: manages the invocation of
the tools, merges output and stores it in storage
subsystem.
Storage subsystem: composed of raw evidence (binary
large objects) and extracted features (XML)
General Overview of process
Image fed to system
(binary data)
Feature Extraction Manager extracts useful features
(uses tool repository)
Feature Extraction Manager stores features in single
XML document (in form of a tree).
The Feature Extraction Manager can then run other
tools on the found data and add to the xml document.
Data stored in storage sub system, where the binary
data or the XML tree can be accessed
Forensic Applications
Timeline browser
Mainstream tools do file-system browsing (relies on filesystem meta-data)
This application of XIRAF can get all XML fragments
with a timestamp, gathered from different tools (which
could include things like chat logs).
Photo search
Finds digital images that meet desired conditions
Can consider camera model, date and time of recording,
image resolution and more.
Forensic Applications
Child Pornography Detection
Uses hash of various files that are known to contain
child pornography
Matches files against a database of hashes
The hash database is converted to XML, and preloaded
into the XML database XIRAF contains.
The comparison is done during the feature extraction
phase.
Conclusion/future work
Too early to draw definitive conclusions (just a
prototype)
An increasing number of tools have started producing
output in XML.
Mobile phone queries
More knowledge bases
References
W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de
Vries. ““XIRAF – XML-based indexing and querying
for digital forensics”. Available at:
http://dfrws.org/2006/proceedings/7-Alink.pdf