Transcript Lecture22

A summary of the report written by W. Alink, R.A.F.
Bhoedjang, P.A. Boncz, and A.P. de Vries.
The Problem
 Large amount of data – possibly terabytes
 Limited amount of time
 Higher chance of missing traces
 Diversity of data
 Too many specialized tools
 Difficult to integrate results


Time constraints
Knowledge constraints
Solution
 Separate feature extraction from analysis:
 Feature Extraction: The extraction of useful features
from raw data- Includes more than just file data
 Analysis: Browsing, querying and correlating.
 One output format for forensic analysis tools (based on
XML)
 XML for storing and querying the output of the tools.
 Automate feature extraction
 Various current projects in law enforcement community
related to automated feature extraction
XIRAF
 Prototype system that uses this approach
 “XML Information Retrieval Approach to digital
Forensics”
 Automatic feature extraction from disk image/s
 Stores data in XML database
 Uses XQuery (XML query language) to access the
database and the data from the disk-image.
Framework
 3 components:
 Tool repository: feature extraction tools
 Feature extraction manager: manages the invocation of
the tools, merges output and stores it in storage
subsystem.
 Storage subsystem: composed of raw evidence (binary
large objects) and extracted features (XML)
General Overview of process





Image fed to system
(binary data)
Feature Extraction Manager extracts useful features
(uses tool repository)
Feature Extraction Manager stores features in single
XML document (in form of a tree).
 The Feature Extraction Manager can then run other
tools on the found data and add to the xml document.
 Data stored in storage sub system, where the binary
data or the XML tree can be accessed
Forensic Applications
 Timeline browser
 Mainstream tools do file-system browsing (relies on filesystem meta-data)
 This application of XIRAF can get all XML fragments
with a timestamp, gathered from different tools (which
could include things like chat logs).
 Photo search
 Finds digital images that meet desired conditions
 Can consider camera model, date and time of recording,
image resolution and more.
Forensic Applications
 Child Pornography Detection
 Uses hash of various files that are known to contain
child pornography
 Matches files against a database of hashes
 The hash database is converted to XML, and preloaded
into the XML database XIRAF contains.
 The comparison is done during the feature extraction
phase.
Conclusion/future work
 Too early to draw definitive conclusions (just a
prototype)
 An increasing number of tools have started producing
output in XML.
 Mobile phone queries
 More knowledge bases
References
 W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de
Vries. ““XIRAF – XML-based indexing and querying
for digital forensics”. Available at:
http://dfrws.org/2006/proceedings/7-Alink.pdf