Chapter 4 Filesx

Download Report

Transcript Chapter 4 Filesx

Files
Chapter 4
What is a File?
A computer file is a block of arbitrary information, or resource for storing information,
which is available to a computer program and is usually based on some kind of
durable storage. A file is durable in the sense that it remains available for programs to
use after the current program has finished. Computer files can be considered as the
modern counterpart of paper documents which traditionally are kept in offices' and
libraries' files, and this is the source of the term.
--Wikipedia
• Questions:
– How are files stored?
– How do we retrieve them?
2
Definition obtained from: http://en.wikipedia.org/wiki/Computer_file
How are files stored?
• At a low level, files are stored as a bunch of
bytes on a hard drive (or other storage media)
• However, hard drives don’t understand “files”
– Hard drives simply contain a huge collection of
bytes of data
• We need some way to retrieve the bytes
composing our file from the hard drive
– That’s where filesystems come in
3
Human vs. Hard Drive View of a File
• Left Side: Hard Drive View
– Actually, data would be further converted from
hex to binary (1 and 0)
• Right Side: Human View
– Data is converted into human readable
characters
4
Files
HEX is useful when attempting to view a file that is partially
deleted. Which lends us to two questions:
1.
2.
Why would a partially deleted file have difficulties being opened or
viewed normally?
What parts of a file does a HEX editor allow us to see, which
otherwise would not be visible?
Files, File Structures, and File Formats
• To answer the questions on the other slide, we need to
investigate the basics of a file, file structure, and file
format.
• A partially deleted file in many cases may be missing
part of its formatting data, the data that identifies the
file.
• It is the formatting file that identifies the file to its
parent or native software.
• If a file doesn’t contain the formatting information, the
software or Operating System will most likely not be
able to access or execute the file.
• It is this formatting information that uniquely identifies
a file.
Different Formats
• There are hundreds of different formats for data.
• There are also formats for executable programs
on different platforms. (Windows, Linux, Mac,
Unix, etc.…)
• Each format defines how the sequence of bits
and bytes are laid out, with ASCII based text files
being one of the simplest formats for humans to
decipher.
Other Formats
• Some file formats are designed to store very
particular sorts of data:
– JPEG formats – is designed to store photo images.
– Gifs formats – is designed for both photo images
and animation.
– QuickTime format – can act as a container for
many different types of multimedia.
Text Files Formats
• A Text File is simply one that stores any text.
– Format such as ASCII or UTF-8, with few if any control
characters.
– Other file formats, such as HTML, or the source code
of some particular programming language, are in fact
also text files, but follows more strict rules for specific
purposes.
• Parent program, meaning the program or
software that is used to create, execute, or
otherwise access the file.
• In most cases a file will contain data , its file
signature, from which its parent software will be
able to identify and handle its operation.
File Signatures
• File Signature – contained in the file header.
• File Header – Not see by the user of the software, but
very important for the file to function as designed.
– It is this data contained within the file header that is used
to identify the format of the file.
• File Headers – may also contain data regarding the
integrity of the file as well as information about itself
and its contents. This data is often referred to as
Metadata.
File Format Structures
• There is no one specific file format structure
that fits all file types.
• File formats will vary as well as file content.
• The contents of an image, as well as its
format, for example, will be different from the
contents and format of a word processing
document.
File Extensions
• File formats are easily identified by file extensions.
• Windows Operating System uses file extensions to bind
an application to a specific file type.
– Example: Windows binds Adobe Reader to the .PDF file
extension. Whereas, MS WORD to the .Doc or .DocX file
extension.
• File extensions are specific to the Window Operating
System and without an extension the Window
Operating System would not know how to open,
process, or handle a file.
Question:
What would occur if the file extension of an
executable (.EXE) file was changed to that of an
Adobe file extension (.PDF)????
ANSWER:
Windows would look at the file extension and
see that it’s a .PDF; it would therefore hand that
file over to Adobe to open. Adobe would
attempt to launch or open the file and report an
error since the file, regardless of its name, is not
actually an Adobe file.
Registry
• Window stores this application binding
information in a section of the Operating System
(OS) called the registry.
• Each file type contains a corresponding file
extension; this correlation stored within the
registry tells the OS what type of program is
needed to access a certain file type. This is
Window’s way of organizing the many different
types of files to their corresponding software.
OS
• When the OS identifies an extension say .CSV
(Comma Separated Values), the OS looks to
the registry and finds which application is
bound to this extension. In most cases, MS
Excel is bound to CSVs, so Windows will hand
it over to Excel.
• A file extension and/or its corresponding
registry information can be manipulated by a
savvy user.
Changing File Extensions
• Suppose a change was made to the registry so that the .CSV
file extension was associated to and therefore opened with
an image viewer such as Window Picture Viewer.
• This will cause an error because the file was an Excel file
and not an image.
• A file with an incorrect file extension would open as long as
the Window Registry had that “incorrect” file extension
associated with the correct software.
• Remember, changing or renaming a file extension does not
change the content of the file; it only changes the way in
which Windows OS handles the file (i.e. which application
the file is sent to).
Computer Criminals
• So why is the way the OS handles the
interpretation of a file’s extension important
to a cyber forensic investigator?
• Computer criminals can use file extensions to
hide files simply by changing the file
extension.
Changing A File’s Extension To Evade
Detection
• The process to change a file’s extension to
evade detection is quite simple:
– Step 1: Create a legitimate looking folder into
which you wish to place your files. Use a name
that will not be conspicuous.
Creating a file extension to evade
detection
• Step 2:
– Open the folder that you
created
– Select Organize menu, select
layout and select Menu Bar
• Step 3:
– Open the Tools tab and select
Folder Options, and select the
View Tab
Removing the file extension
• Step 4:
– Uncheck “Hide extensions for known file types”
– File extension type is revealed
• Step 5:
– Right-Click on the file name to Rename the file,
including providing any valid file extension type
(.doc,.xls, .exe,.txt) The file name is changed based
upon the extension provided (Do this to 4 images)
Removing the file extension
• Step 6:
– Click “Hide extensions for known file types, to
hide the new file extensions.
• Notice where there was once 10 image files
there are now only six.
• Scanning simply for image files will results in
missing the four files with modified
extensions!
Notes about Hiding Files
• Remember Windows looks at a file’s extension
first, and hands that file over to the
appropriate application to open. A Microsoft
Word application attempting to open a .JPEG
or .TIF file would attempt to launch or open
the file and report an error since the file,
regardless of its name, is not actually a
Microsoft Word file
File Signature
• File Signature – also known as the “Magic
Number”.
• File Signature – is the binary that identifies a
particular file: the data that will aid in the
identification of the file to its native or parent
software.
HEX Editor
• For common file formats, the file signatures
conveniently represent the names of the file
types.
– Example: Image file GIF87a format in HEX equals
0x474946383761. GIF89a format in HEX equals
0x3474946383961. GIF (Graphic Interchange Format)
– First 6 Bytes of the file.
JPEG
• JPEG – Joint Photographic Expert Group image
file is 0x4A464946, which is the ASCII
equivalent of JFIF (JPEG File Interchange
Format)
– JPEG begin at the seventh byte of the file
signature.
Files and The Hex Editor
• Back to our case, a forensic investigators will have to
look at million pieces of data for potential evidence.
• These files can be renamed and moved deeply in the
logical folder structure.
• Logical folder structure – A way in which to store your
files.
–
–
–
–
Assists in the orderly storage of your files.
Makes it easier to find your files.
Aids in managing your files.
Simpler way to archive your files.
• Remember, there can be hundreds if not thousands of
folders and even more files, all of which may seem
inconsequential as they are scattered and stored
throughout an individual’s hard drive.
File Signature
• File Signature also known as the magic number.
• Magic numbers are referred to as magic because
the purpose and significance of their values are
not apparent without some additional
knowledge.
• A file signature is the binary that identifies a
particular file: the data that will aid in the
identification of the file to its native or parent
software.
• For common file formats, the file signatures
conveniently represent the names of the file
types.
Gifs and Jpegs
• Gifs file signature occupies the first six bytes
of the file.
• Jpegs file signatures starts at the seventh byte.
• MS Word document signature is represented
by d0 cf 11 e0 which looks like docfile
ASCII is Not Text or HEX
• There may not always be an ASCII equivalent
to a file type; this is one reason to use HEX.
• ASCII has limitation and remember Unicode
extends ASCII. That is why we use HEX
because we can’t represents all characters
from other languages using ASCII.
Value of File Signature
• We see that even when a file extension has been
change that we still can view the file contents.
• If we would search the entire drive for a binary
representation of company “XYZ”, we will be able
to find it even with the file signature deleted or
changed.
• Even if file may have been deleted or file
signature changed, and some data may have
been overwritten but there may be remnants of
the file that can be retrieved.
• A forensic examiner cannot always depend on
having an intact file or file with a signature.
File Signature Database
• There are many different file signatures. Too
many to remember.
• Internet search is the best way to file a
signature.
http://www.filesignatures.net/
• Is a good place to look for a file signature
Complex Files: Compound,
Compressed, and Encrypted Files
• We will discuss just the basics of the above topics.
• A Compoundfile – is a file format that consists of
numerous files. The compound file itself is little more
than a container for those files. The structure within a
compound file is similar to that of a real file system
consisting of a hierarchy of storage with one parent
directory.
• There is a root directory folder, children contained
within and files (data streams) contained therein.
Compound files are sometimes associated with
Microsoft’s Compound File Binary Format (CFBF) file.
Compound File
• All allocations of space within a Compound File
are done in chunks or units called sectors.
• The size of a sector is definable at creation time
of a Compound File, and those sectors are usually
512 bytes in size.
• A virtual stream is made up of a sequence of
sectors.
• At its simplest, the Compound File Binary Format
is a container, with little restriction on what can
be stored within it.
• In a more loosely way, compound files represents
any file that may contain a directory structure.
Compound File Signature
• As with other files, the file header of a compound file will
contain a file signature, identifying the file; it will also contain
information required to interpret the rest of the file such as
file’s size and storage location.
• It is this metadata that allows the software to reconstruct the
file into the appropriate file format that will display the file’s
specific information (i.e. size, creation date, change date,
etc.).
• The file therefore needs to “reconstructed” by its parent
software in order for the data to be legible or otherwise
accessible.
Example
• We think data storage is linear. Example Company
XYZ Corp. We think X comes before Y and Y
before Z.
• What would we see if it is nonlinear? Maybe
“oZpYCrX”
• If that same data is non contiguous, other data
maybe intertwined. (e.g.,
…?>>o…Z^qLp…77Ymn….C@qwerbsbdX…)
• Thus XYZ Corp is not easily discernable now.
• We would need an instruction set to reconstruct
this data.
Why Do Compound Files Exist?
• Files have become more complex and need to
contain a lot of information.
• Many files contain Object Linking and
Embedding(OLE) technology, in which one file
may contain many files.
OLE
• Allows user to integrate data from different
applications.
• Object linking allows user to share a single source
of data for a particular object.
• The document contains the name of the file
containing the data, along with a picture of the
data.
• When the source is updated, all the documents
using the data are updated as well.
Object Embedding
• With object embedding, one application (referred to as
the source) provides data or an image that will be
contained in the document of another application
(referred to as the destination). The destination
contains the data or graphic image, but does not
understand it or have the ability to edit it.
• It simply displays, prints, and/or plays the embedded
item.
• To edit or update the embedded object, it must be
opened in the source application that created it.
• This occurs by double clicking the object or choose the
appropriate edit command when highlighted.
Embedding
• While embedding doesn’t allow user to have a
single source of data, it does make it easier to
integrate applications
• An embedded object contains the actual data
for the object, the name of the application
that created it, and a picture of the data.
Example
• MS Word document may contain a JPG image;
a file within a file.
• Compound files allow for incremental access,
allowing for individual components to be
accessed without the need of the entire file.
• This can save time and resources by not
having to load an entire file, only the piece or
pieces desired.
Compressed Files
• Compressed files are essential compound files
that are compressed.
• Contained within the compound files are
compression instructions.
.ZIP
• Common file extensions associated with
compressed files is .zip.
• Other ZIP file formats including WINZIP, 7-Zip,
Gzip, and Rzip.
• A file format of a compressed file (.zip)
changes depending upon its compression
algorithm.
Questions
• What happens when an application is
upgraded (example: going from MS Office 03
to MS Office 2007)? How might this effect the
application’s file signature?
File signature has changed.
Questions
• What is the importance to a cyber forensic
investigation and what does this mean?
It means that the file is a compound file
consisting of other files. If we would view the
entirety of the file with our Hex editor we would
not uncover any legible ASCII characters.
Question
• Why?
The file structure and assembly instructions are
contained within the file; thus, the file would
need to be mounted (process by making it
ready for use by OS) by its native software in
order for the contents to be viewed.
Mount
• Viewing and, more importantly, searching the
contents of these “complex” files are possible
once they are mounted. Forensic tools
incorporate the software to mount these so
that searching is possible.
• If these complex files are not mounted then
no search results will be obtained.
Forensics and Encrypted Files
• Encrypted files are also complex but differ in that an
encryption key is required to decrypt an encrypted file.
• Encryption uses an algorithm (cipher) to alter or
transform the data in an attempt to prevent
reconstruction by those without the instruction set.
A.K.A Encryption Key.
• Decryption refers to the reverse process of making the
data readable or otherwise accessible.
Encrypted Files
• Encryption – is a method by which confidentiality
of data can be protected.
• An encrypted cannot be decrypted without the
encryption key (aka password).
• The encryption process uses an algorithm or
cipher to mathematically transform the plaintext
along with the encryption key (password),
thereby encoding it in such a manner that it is
illegible or indecipherable.
Encrypted Files
• With the correct decryption key (password)
the data is then run through its associated
cipher text (algorithm) and converted back to
clear text, which is, by default, decrypted.
Remember, this entire process occurs in
binary, as 0’s and 1’s.
• It is the cipher that actually changes the files;
the password is just a set of data which are
used to “mathematically mix” and set the
process in motion, turning the plaintext data
into an unreadable end product.
The Structure of Cipher
• The structure of ciphers depends upon the cipher’s
type. Types of ciphers vary but generally they can be
categorized by the following:
– Block or Stream – Block ciphers generally work on fixed
length bits of data called blocks. The cypher may take a
256 – bit block of encrypted data. In a stream cipher, the
plaintext bits are encrypted one at a time along with the
encryption key.
– Symmetric or Asymmetric –
• Symmetric encryption – the same encryption key or password is
used for both encryption and decryption.
• Asymmetric encryption (public-key cryptography) – different keys
(public & private) are used for encryption and decryption. Data is
encrypted using a person’s public key, one in which everyone may
have access to or even be distributed. However, data can only be
decrypted using the person’s private key, one which is kept secret
by the individual.
Advance Encryption Standard (AES)
• Standard adopted by the United States government and one of the
most popular encryption methods available encryption methods
available and in use today.
• There are many other encryption algorithms or formats available
and many books on them. We will not be cover them in this class.
• They all contain some level or form of instruction needed to
reconstruct the file.
• If the instructional data needed to reconstruct a compound file is
missing, overwritten, destroyed, or compromised, the file may not
be recoverable, even though the data containing the evidence may
still be contained within the file itself.
Summing it up
• It may be possible to reconstruct a complex
file which has been partially overwritten,
Forensic analysts are creative, cutting edge,
innovative, and very intelligent; they have
developed solutions for some of the most
complex problems.
• However, recovering the data with normal
“point and click” methods may not always be
possible.