Transcript 슬라이드 1
Pusan National University
em
Spatiotemporal Database Laboratory
File Processing : Storage Media
2004, Spring
Pusan National University
Ki-Joune Li
Pusan National University
em
Major Functions of Computer
Spatiotemporal Database Laboratory
Computation
Storage
Communication
Presentation
Pusan National University
em
Storage of Data
Major Challenges
How to store and manage a large amount of data
Spatiotemporal Database Laboratory
Example : more than 100 peta bytes for EOS Project
How to represent sophisticated data
Pusan National University
em
Modeling and Representation
of Real World
Example
Building DB about Korean History
Spatiotemporal Database Laboratory
Real World
Computer World
Very complicated and Depending on viewpoint
Database Course : 2004 Fall semester
Pusan National University
em
Managing Large Volume of Data
Large Volume of Data
Cost for Storage Media
Processing Time
Spatiotemporal Database Laboratory
Not very important and negligible
Comparison between main memory and disk access time
RAM : several nanoseconds (10-9 sec)
Disk : several milliseconds (10-3 sec)
Time is the most valuable resource
Example
Retrieving a piece of data from 100 peta bytes DB
Pusan National University
em
Managing Large Volume of Data
Management of Data
Secure Management
Spatiotemporal Database Laboratory
From hacking
From any kinds of disasters
Consistency of Data
Example
Failure during a flight reservation transaction
Concurrent transaction
Pusan National University
Spatiotemporal Database Laboratory
em
Goals of File Systems
To provide with
1. efficient Data Structures for storing large and complex data
2. Access Methods for rapid search
3. Query Processing Methods
4. Robust Management of Transactions
Pusan National University
em
Memory Hierarchy
Large Data Volume
Spatiotemporal Database Laboratory
Not be stored in main memory
But in secondary memory
Memory Hierarchy
Faster
Cache Memory
256 K bytes
Main Memory
512 M bytes
Secondary Memory
40 G bytes
Tertiary Memory
100 Tera bytes
Cheaper
Pusan National University
em
Flash Memory
Non-Volatile
Data survives power failure, but
Data can be written at a location only once, but location can
be erased and written to again
Spatiotemporal Database Laboratory
Speed
Can support only a limited number of write/erase cycles.
Erasing of memory has to be done to an entire bank of memory
Reads are roughly as fast as main memory
But writes are slow (few microseconds), erase is slower
Cost per unit of storage roughly similar to main memory
Widely used in embedded devices such as digital cameras
Pusan National University
em
Optical Storage
Non-volatile :
Spatiotemporal Database Laboratory
Speed
data is read optically from a spinning disk using a laser
CD-ROM (640 MB), DVD (4.7 to 17 GB), CD-R, DVD-R
CD-RW, DVD-RW, and DVD-RAM
Reads and writes are slower than with magnetic disk
Juke-box systems
Large numbers of removable disks,
Few drives, and
Mechanism for automatic loading/unloading of disks
For storing large volumes of data
Pusan National University
em
Tape
Non-volatile
Speed
Spatiotemporal Database Laboratory
Sequential access : much slower than disk
Cost
Primarily Used for backup
Very high capacity (40 to 300 GB tapes available)
Tape can be removed from drive
Drives are expensive
Tape jukeboxes
hundreds of terabytes to even a petabyte
Pusan National University
em
Data Access with Secondary Memory
Get Data
Hit Ratio rh = nh / na
Access Request
Spatiotemporal Database Laboratory
How to increase hit ratio ?
If in main
memory
Get Data
Main
Memory
If not in main
memory
Load on main memory
Access to Disk
Disk
Pusan National University
em
Why Hit Ratio is so important ?
Example
Spatiotemporal Database Laboratory
for(int i=0;i<1000;i++)
Nbytes=read(fd,buf,100);
when rh = 0
1000 * 10-2 sec = 10 sec
1000 disk accesses ?
when rh = 1
1000 * 10-8 sec = 10-5 sec
Pusan National University
em
Physical Structure of Disk
200~400 sectors
Spatiotemporal Database Laboratory
512 bytes
2 * nDF
Pusan National University
em
Disk Access Time
Disk Access Time
t = tS + tR + tT , where
tS : Seek Time
Spatiotemporal Database Laboratory
tR : Rotational Latency
Time to reposition the head over the correct track
Average seek time is 1/2 the worst case seek time
4 to 10 milliseconds on typical disks
Time to reposition the head over the correct sector
Average rotational latency : ½ r (to find index point) + ½ r = r
In case of 15000 rpm : r =1*60sec/15000 = 4 msec
tT : Transfer Time
Time to transfer data from disk to main memory via channel
Proportional to the number of sectors to read
Real transfer time is negligible
Pusan National University
em
Block-Oriented Disk Access
Example
for(int i=0;i<1000;i++)
Nbytes=read(fd,buf,10);
10 bytes
Spatiotemporal Database Laboratory
1000 times
100 times
Buffer in
main memory
10 times
Number of Disk Accesses
1 block (e.g. 1024 bytes)
1024 bytes
Pusan National University
em
Disk Block
Unit of Disk Access
Block Size
Spatiotemporal Database Laboratory
Normally multiple of sectors
1K, 4K, 16K or 64K bytes depending on configuration
Why not large block ?
Limited by the size of available main memory
Too large : unnecessary accesses of sectors
e.g. only 100 bytes, when block size is given as 64K
1 block : 128 sectors (about ½ track, ½ rotation, 2 msec)
Too wasteful
Pusan National University
em
Buffer
Buffer
Spatiotemporal Database Laboratory
Temporary memory to transfer a chunk of data
1 buffer : multiple blocks
Page
A piece of buffer (main memory) corresponding with block
Page Replacement when buffer is full
Pusan National University
em
Buffer Management : Read
Get Data
Access Request
Spatiotemporal Database Laboratory
Read Request
If in main
memory
Get Data
Main
Memory
If not in main
memory
Load on main memory
and Replacement
Access to Disk
Disk
Pusan National University
em
Buffer Management : Write
Write Data
Access Request
Spatiotemporal Database Laboratory
Write Request
If in main
memory
Write Data
Write Data on Disk
Main
Memory
If not in main
memory
Load on main memory
and Replacement
Access to Disk
Disk
Pusan National University
em
Buffer Manager : Replacement Policy
LRU
Spatiotemporal Database Laboratory
Replace the block least recently used
Most operating system and buffer management
Idea behind LRU – use past pattern of block references as a
predictor of future references
Prediction of future reference
Queries have well-defined access patterns (such as
sequential scans), and a database system can use the
information in a user’s query to predict future references
Pusan National University
em
Buffer Manager : Replacement Policy
Pinned block :
Toss-immediate strategy :
Spatiotemporal Database Laboratory
memory block that is not allowed to be written back to disk.
frees the space occupied by a block as soon as the final
tuple of that block has been processed
Most recently used (MRU) strategy :
system must pin the block currently being processed. After
the final tuple of that block has been processed, the block is
unpinned, and it becomes the most recently used block.
Pusan National University
em
Logical Structure of File
Field
Field
Field
Record (Tuple)
Fixed Size Record
Record
Spatiotemporal Database Laboratory
Record
Block
File
Variable Size Record
Pusan National University
em
Fixed Size Record
Fixed Size
Spatiotemporal Database Laboratory
Disk Address
Fixed Number of Fields, and
Fixed Size of each Field
Easy to implement
(n-1)*srecord
Deletion of a record
Like Array but no movement
Free Record List or
Pointer to Next Record
Pusan National University
em
Variable Length Record
Variable Length due to
Spatiotemporal Database Laboratory
Variable Number of Fields, or
Variable Size of each Field
Complicated to implement
Implementation
Delimiter (, size, or pointer)
Slotted Page
Fixed Length
Overflow Area
Reserved Space
Pusan National University
em
Delimiters
Spatiotemporal Database Laboratory
Record
Record
…
Delimiters
Record
Record
Record
Pointer/Size
• Difficult to handle deletions and insertions
…
Record
Pusan National University
Spatiotemporal Database Laboratory
em
Slotted Page
Pointer to Record
Records can be moved around within a page
to keep them contiguous with no empty space between them
entry in the header must be updated.
Pointers should not point directly to record
But to the entry for the record in header.
Spatiotemporal Database Laboratory
Pusan National University
em
Reserved Space
Maximum # of Fields
Spatiotemporal Database Laboratory
Pusan National University
em
Overflow Area
First field of record
Rest records
Pusan National University
em
Binary Large Object Block (BLOB)
If size (field) > size (block)
e.g. Image or Video
Spatiotemporal Database Laboratory
Name
ID#
Photo
Block size
BLOB : Type of field where its size is greater than block size
cf. CLOB : Text rather than binary
Name
ID#
Contiguous Reserved
Block for BLOB
Pusan National University
em
File System
Example
fd=open(”data.txt”,O_RDONLY,0);
Nbytes=read(fd,buf,100);
Spatiotemporal Database Laboratory
How to process these functions in OS ?
Pusan National University
em
i (index)–node : information about file
i-node
Spatiotemporal Database Laboratory
Attributes
Pointers to data block
Data Block
Name
Type : directory, data, special
Permission
Ownership
Last updated date/time
Created date/time
Pusan National University
em
i-node : Pointer to data block
Attributes
Data Block
Pointers to data block
(0-9: up to 40K bytes)
...
Spatiotemporal Database Laboratory
Single direct Pointer
Double direct Pointer
Data Block
Pointer Block
(1024 blocks)
Data Block
...
Data Block
Pusan National University
em
Block configuration for i-node
0
1
2
Spatiotemporal Database Laboratory
3
Boot Block
Super Block
i-node 1 ~ 40
i-node 41 ~ 80
…
Data block
Data block
…
Data block
Reserved Block
Given by formatting
User space
Pusan National University
em
Implementation of File Hierarchy
Root directory
block
i-node 1
Spatiotemporal Database Laboratory
Attributes
i-node for
root directory
1
1
4
7
14
9
6
8
.
..
bin
dev
lib
etc
usr
tmp
Data block
for /usr/lik/data.txt
i-node 6
Attributes
i-node for
/usr
i-node 107
Attributes
i-node for
/usr/lik/data.txt
Directory block
for /usr
6
1
19
30
54
.
..
lik
kimmk
parksh
Directory block
for /usr/lik
19 .
6 ..
107 data.txt
i-node 19
Attributes
i-node for
/usr/lik
Pusan National University
em
FAT (File Allocation Table)
Spatiotemporal Database Laboratory
DOS or MS-Windows 98
Same purpose of i-node in UNIX
Pusan National University
em
fd=open(”data.txt”,O_RDONLY,0);
Nbytes=read(fd,buf,100);
Step 1 : Find i-node for “data.txt” via i-node
Spatiotemporal Database Laboratory
Step 2 : Check owner and access right
Step 3 : Register it to OpenFileTable
from root or current directory
Initialize entry values : e.g. offset, mode
fd : array index of this table
Some entries : reserved for stdio, stderr, etc..
Step 4 : Check ownership and right
Step 5 : Read 100 bytes to buf
open
Read 100 bytes from the OpenFileTable[fd].offset
OpenFileTable[fd].offset += 100;
write
Pusan National University
em
Data Dictionary : What does it contain ?
Data dictionary (also called system catalog) stores metadata
Information about relations
Spatiotemporal Database Laboratory
User and accounting information, including passwords
Statistical and descriptive data
names of relations
names and types of attributes of each relation
names and definitions of views
number of tuples in each relation
Physical file organization information
How relation is stored (sequential/hash/…)
Physical location of relation
operating system file name or
disk addresses of blocks containing records of the relation
Information about indices
Pusan National University
em
Data Dictionary : How to Represent it
Data structure
Spatiotemporal Database Laboratory
specialized data structures designed for efficient access
a set of relations, with existing system features used to
ensure efficient access
The latter alternative is usually preferred
Relation-metadata (relation-name, number-of-attributes, storage-organization, location)
Attribute-metadata (attribute-name, relation-name, domain-type, position, length)
User-metadata (user-name, encrypted-password, group)
Index-metadata (index-name, relation-name, index-type, index-attributes)
View-metadata (view-name, definition)
Pusan National University
em
Persistent Object
Objects in C++ program
Persistent Object
Spatiotemporal Database Laboratory
Volatile Object : Disappears with the termination of program
Non-Volatile Object : Keeps its status despite of its
termination
A Necessary Condition for Object-Oriented Databases
Object vs. Record
Pusan National University
em
OID : Object Identifier
ID given by system
Spatiotemporal Database Laboratory
the only way to identify object
one ID per an object
Logical OID vs. Physical OID
Logical OID
No direct specification from OID to physical location
Need an index that maps an OID to the object’s actual location.
Physical OID
encodes physical location of the object
Physical OIDs typically have the following parts:
a volume or file identifier
a page identifier within the volume or file
an offset within the page
Pusan National University
em
Pointer Swizzling
Pointer
Spatiotemporal Database Laboratory
Object
Pointer Swizzling
OID
Object
Main Memory
Disk Space