Transcript 슬라이드 1
Introduction to the course
Objectives of the course
To provide a solid introduction to the topic of file structures design.
To discuss a number of advanced data structure concepts that are necessary
for achieving high efficiency in file operations.
To develop important programming skills in and object-oriented language
such as C++ or Java.
Required Textbook
File Structures, An Object-Oriented Approach with C++
by
Michael J. Folk, Bill Zoellick
and Greg Riccardi
Publisher: Addison Wesley
C++를 사용한 객체지향 접근방식 파일구조
박석 역
그린출판사
Introduction to the Design and
Specification of File Structures
Outline
What are File Structures?
Why Study File Structure Design
Overview of File Structure Design
Definition
A File Structure is a combination of representations for data in files and of
operations for accessing the data.
A File Structure allows applications to read, write and modify data. It might
also support finding the data that matches some search criteria or reading
through the data in some particular order.
Why Study File Structure Design?
I. Data Storage
Computer data can be stored in three kinds of locations:
Primary Storage ==> Memory [Computer Memory]
Our
Focus
Secondary Storage [Online Disk/ Tape/ CDRom that can be
accessed by the computer]
Tertiary Storage ==> Archival Data [Offline Disk/Tape/
CDRom not directly available to the computer.]
Why Study File Structure Design?
II. Memory versus Secondary Storage
Secondary storage such as disks can pack thousands of megabytes in a small
physical location.
Computer Memory (RAM) is limited.
However, relative to Memory, access to secondary storage is extremely slow
[E.g., getting information from slow RAM takes 120. 10-9 seconds (= 120 nanoseconds)
while getting information from Disk takes 30. 10-3 seconds (= 30 milliseconds)]
Why Study File Structure Design?
III. How Can Secondary Storage Access Time be Improved?
By improving the File Structure.
Since the details of the representation of the data and the implementation of
the operations determine the efficiency of the file structure for particular
applications, improving these details can help improve secondary storage
access time.
Overview of File Structure Design
I. General Goals
Get the information we need with one access to the disk.
If that’s not possible, then get the information with as few accesses as
possible.
Group information so that we are likely to get everything we need with
only one trip to the disk.
Overview of File Structure Design
II. Fixed versus Dynamic Files
It is relatively easy to come up with file structure designs that meet the
general goals when the files never change.
When files grow or shrink when information is added and deleted, it is
much more difficult.
History of File Structures
I. Early Work
Early work assumed that files were on tape.
Access was sequential and the cost of access grew in direct proportion to
the size of the file.
History of File Structures
II. The emergence of Disks and Indexes
As files grew very large, unaided sequential access was not a good solution.
Disks allowed for direct access.
Indexes made it possible to keep a list of keys and pointers in a small file
that could be searched very quickly.
With the key and pointer, the user had direct access to the large, primary
file.
History of File Structures
III. The emergence of Tree Structures
As indexes also have a sequential flavor, when they grew too much,
they also became difficult to manage.
The idea of using tree structures to manage the index emerged in the
early 60’s.
However, trees can grow very unevenly as records are added and
deleted, resulting in long searches requiring many disk accesses to find
a record.
History of File Structures
IV. Balanced Trees
In 1963, researchers came up with the idea of AVL trees for data in memory.
AVL trees, however, did not apply to files because they work well when tree nodes are
composed of single records rather than dozens or hundreds of them.
In the 1970’s came the idea of B-Trees which require an O(logk N) access time where
N is the number of entries in the file and k, the number of entries indexed in a single
block of the B-Tree structure B-Trees can guarantee that one can find one file entry
among millions of others with only 3 or 4 trips to the disk.
History of File Structures
V. Hash Tables
Retrieving entries in 3 or 4 accesses is good, but it does not reach the
goal of accessing data with a single request.
From early on, Hashing was a good way to reach this goal with files that
do not change size greatly over time.
Recently, Extendible Dynamic Hashing guarantees one or at most two
disk accesses no matter how big a file becomes.
Taxonomy of File Structures
Single-key files
Index-based (Tree Data Structure)
Indexed Sequential File B-Tree B+-Tree
Hashing-based (Address Computation)
Hashing File Extendible Hashing File
Multi-key files (multidimensional)
K-D-B tree
Grid file
R-tree
Multimedia Indexing Techniques
Audio, Image, Video
Conceptual Toolkit
: File
Structure Literacy
Objective of conceptual toolkit
Fundamental file concepts
Generic file operations
Conceptual tools in this book
basic tools + evolution of basic tools
– basic tools : Chapter 2 ~ 6
– evolution of basic tools : Chapter 7 ~ 12
B-trees, B+trees, hashed indexes, and extensible dynamic
hashed files
Object-Oriented Toolkit: Making File Structures Usable
Object-Oriented Toolkit
making file structures usable requires turning conceptual toolkit into
collections (classes) of data types and operations
Major problem
complicated and progressive
– often modified and extended from other classes
and details of classes become more complex
Using objects in C++(1)
Features of object in C++
class definition
– data members(attributes) + methods
constructor
– provide a guarantee for initialization of every object &
called in creation time of object
public, private & protected sections
– public label specifies that any users can freely access
– private & protected label are restrict access
Using objects in C++(2)
operator overloading
– allows a particular symbol to have more than one meaning
other features
– inheritance, virtual function, and templates
– explained in later chapters