Example: Data Mining for the NBA - The University of Texas at Dallas

Transcript Example: Data Mining for the NBA - The University of Texas at Dallas

Digital Forensics
Dr. Bhavani Thuraisingham
The University of Texas at Dallas
Intelligent Digital Forensics
November 2, 2011
Reading for Lecture
 http://dfrws.org/2006/proceedings/7-Alink.pdf
 XIRAF – XML-based indexing and querying for digital forensics
http://dfrws.org/2006/proceedings/8-Turner.pdf
 Selective and intelligent imaging using digital evidence bags
 http://dfrws.org/2006/proceedings/9-Lee.pdf
 Detecting false captioning using common-sense reasoning
Abstract of Paper 1
 This paper describes a novel, XML-based approach towards
managing and querying forensic traces extracted from digital
evidence. This approach has been implemented in XIRAF, a
prototype system for forensic analysis. XIRAF systematically
applies forensic analysis tools to evidence files (e.g., hard disk
images). Each tool produces structured XML annotations that can
refer to regions (byte ranges) in an evidence file. XIRAF stores
such annotations in an XML database, which allows us to query
the annotations using a single, powerful query language (XQuery).
XIRAF provides the forensic investigator with a rich query
environment in which browsing, searching, and predefined query
templates are all expressed in terms of XML database queries
Introduction
 Framework for forensic analysis called XIRAF
 A clean separation between feature extraction and analysis
- Features extracted are stored in XML format
 A single, XML-based output format for forensic analysis tools
 The use of XML database technology for storing and querying the
XML output of analysis tools.
XIRAF Framework
 Consists of three components
 Feature extraction manager
- Features are extracted from BLOBs (Binary large objects)
using feature extraction tools
- Output of the tools are coded in XML for the forensics
analyzer
 Tool repository
- Tools are wrapped (e.g., object wrappers)
 Storage subsysystem
- Stores BLOBs and XML annotations
- XQuery used to query XML data
Forensic Applications
 Authors have implemented following applications
- Timeline browser: Through web browser examiner can
-
look at data/time of interest
Photo search
 Search for images satisfying certain conditions
Child pornography detection
 Using hashing carried out matching
Summary and Directions
 The separation of feature extraction and analysis brings benefits to
both phases. XIRAF extracts features automatically, which is
essential when processing large input sets.
 The use of XML as a common, intermediate output format for
tools allows the integration of the output of diverse, independent
tools that produce similar information. This handles both the
heterogeneity present in the input data (e.g., different browser
types) and with the diversity of forensic analysis tools.
 These benefits are demonstrated both by the timeline browser and
by child pornography detection program.
 By storing extracted features in an XML database system one can
analyze those features using a single, general-purpose, powerful
query language. In addition, we benefit automatically from
advances that are made in the area of XML database systems
 Directions: Use semantic web technologies?
Abstract of Paper 2
 This paper defines what selective imaging is, and the types of
selective imaging that can be performed. This is contrasted with
intelligent imaging and the additional capabilities that have to be
built into an imager for it to be ‘intelligent’. A selective information
capture scenario is demonstrated using the digital evidence bag
(DEB) storage format. A DEB is a universal container for digital
evidence from any source that allows the provenance to be
recorded and continuity to be maintained throughout the life of the
investigation. The paper concludes by defining the ‘ultimate test’
for an intelligent and selective imager
Selective Imaging
 Selective imaging is a term that is generally associated with the
decision not to acquire all the possible information during the
capture process.
 It is now recognized that ‘partial or selective file copying may be
considered as an alternative’ when it may not be practical to
acquire everything.
 Techniques include manual selection, semi-automatic selection,
automatic selection
Intelligent Imaging
 Include the domain experts in the imaging process
 How do you go about capturing the knowledge of the technical
experts that are familiar with digital technical complexities and
legal domain experts and combine them?
 How do you know that you have captured everything relevant to
the case under investigation or have not missed evidence of other
offences?
Digital Evidence Bags
 Both selective and intelligent imaging techniques offer many more options
and capabilities than current bit stream imaging.
 There are currently no commercial tools that perform selective imaging and
adequately record the provenance of the selected information.
 Furthermore, no method has existed that captured the criteria or method
used by the examiner in deciding what to acquire. For example, was an
arbitrary manual selection used or was information captured based on
category of information, file extensions, file signature or hash set.
 Authors solution to these problems is by the use of the digital evidence bag
(DEB) format. A DEB is a universal container for digital information from any
source. It allows the provenance of digital information to be recorded and
continuity to be maintained throughout the life of the exhibit.
 Additionally, DEBs may be encapsulated within other DEBs. This feature
differentiates the DEB structure from that used by current monolithic formats
commonly in use.
“The Ultimate Test”
 The method and storage container used must be able to store
sufficient information about the provenance of the information
captured such that when the information is restored it is identical to
that which would have been acquired should a bit stream image
have been taken.
Summary and Directions
 The methodology described and demonstrated by the authors is
claimed to be a big improvement over bit stream imaging methods
currently used.
 Directions
- Better selection methods, more accurate?
Abstract of Paper 3
 Detecting manipulated images has become an important problem in many domains
(including medical imaging, forensics, journalism and scientific publication) largely due
to the recent success of image synthesis techniques and the accessibility of image
editing software. Many previous signal-processing techniques are concerned about
finding forgery through simple transformation (e.g. resizing, rotating, or scaling), yet
little attention is given to examining the semantic content of an image, which is the main
issue in recent image forgeries. Here, the authors present a complete workflow for
finding the anomalies within images by combining the methods known in computer
graphics and artificial intelligence. They first find perceptually meaningful regions using
an image segmentation technique and classify these regions based on image statistics.
We then use AI common-sense reasoning techniques to find ambiguities and anomalies
within an image as well as perform reasoning across a corpus of images to identify a
semantically based candidate list of potential fraudulent images. Their method
introduces a novel framework for forensic reasoning, which allows detection of image
tampering, even with nearly flawless mathematical techniques.
Introduction
 Detecting manipulated images has become an important problem
in many domains
 Many previous signal-processing techniques are concerned about
finding forgery through simple transformation (e.g. resizing,
rotating, or scaling),
 Need to examine the semantic content of an image
 Authors present a complete workflow for finding the anomalies
within images by combining the methods known in computer
graphics and artificial intelligence
Introduction
 In Photo fakery, photo manipulation techniques may fall into four
categories:
 Deletion of details: removing scene elements
 Insertion of details: adding scene elements
 Photomontage: combining multiple images
 False captioning: misrepresenting image content
Technical Approach
 Authors find perceptually meaningful regions using an image
segmentation technique and classify these regions based on
image statistics.
 They then use AI common-sense reasoning techniques to find
ambiguities and anomalies within an image as well as perform
reasoning across a corpus of images to identify a semantically
based candidate list of potential fraudulent images.
 They claim their method introduces a novel framework for forensic
reasoning, which allows detection of image tampering, even with
nearly flawless mathematical techniques.
Technical Approach
 Image Segmentation
- Segment the source into regions of importance
- Compare across images in a corpus
 Classification
- Segment based classification
 Common sense reasoning
- Handles classification ambiguities
Summary and Directions
 Introduces a hybrid method for image forensics.
- Given a subset of a corpus as a suspicious candidate set,
analyze the candidates through specific metrics that are
optimized to find fakery given the image’s qualitative
classification. This use of common-sense reasoning goes
 Directions
- To integrate the facts discovered in a photo corpus to help
identify what evidence may be missing as well as what fact
might be unique to this scenario.

Example: Data Mining for the NBA - The University of Texas at Dallas

Transcript Example: Data Mining for the NBA - The University of Texas at Dallas

Directory