슬라이드 1

Download Report

Transcript 슬라이드 1

Pusan National University
em
Spatiotemporal Database Laboratory
File Processing : Storage Media
2004, Spring
Pusan National University
Ki-Joune Li
Pusan National University
em
Major Functions of Computer



Spatiotemporal Database Laboratory

Computation
Storage
Communication
Presentation
Pusan National University
em
Storage of Data

Major Challenges

How to store and manage a large amount of data

Spatiotemporal Database Laboratory

Example : more than 100 peta bytes for EOS Project
How to represent sophisticated data
Pusan National University
em
Modeling and Representation
of Real World

Example

Building DB about Korean History
Spatiotemporal Database Laboratory
Real World


Computer World
Very complicated and Depending on viewpoint
Database Course : 2004 Fall semester
Pusan National University
em
Managing Large Volume of Data

Large Volume of Data

Cost for Storage Media


Processing Time

Spatiotemporal Database Laboratory
Not very important and negligible
Comparison between main memory and disk access time




RAM : several nanoseconds (10-9 sec)
Disk : several milliseconds (10-3 sec)
Time is the most valuable resource
Example

Retrieving a piece of data from 100 peta bytes DB
Pusan National University
em
Managing Large Volume of Data

Management of Data

Secure Management


Spatiotemporal Database Laboratory

From hacking
From any kinds of disasters
Consistency of Data

Example


Failure during a flight reservation transaction
Concurrent transaction
Pusan National University
Spatiotemporal Database Laboratory
em
Goals of File Systems

To provide with
1. efficient Data Structures for storing large and complex data
2. Access Methods for rapid search
3. Query Processing Methods
4. Robust Management of Transactions
Pusan National University
em
Memory Hierarchy

Large Data Volume


Spatiotemporal Database Laboratory

Not be stored in main memory
But in secondary memory
Memory Hierarchy
Faster
Cache Memory
256 K bytes
Main Memory
512 M bytes
Secondary Memory
40 G bytes
Tertiary Memory
100 Tera bytes
Cheaper
Pusan National University
em
Flash Memory

Non-Volatile


Data survives power failure, but
Data can be written at a location only once, but location can
be erased and written to again
Spatiotemporal Database Laboratory



Speed




Can support only a limited number of write/erase cycles.
Erasing of memory has to be done to an entire bank of memory
Reads are roughly as fast as main memory
But writes are slow (few microseconds), erase is slower
Cost per unit of storage roughly similar to main memory
Widely used in embedded devices such as digital cameras
Pusan National University
em
Optical Storage

Non-volatile :


Spatiotemporal Database Laboratory


Speed


data is read optically from a spinning disk using a laser
CD-ROM (640 MB), DVD (4.7 to 17 GB), CD-R, DVD-R
CD-RW, DVD-RW, and DVD-RAM
Reads and writes are slower than with magnetic disk
Juke-box systems




Large numbers of removable disks,
Few drives, and
Mechanism for automatic loading/unloading of disks
For storing large volumes of data
Pusan National University
em
Tape

Non-volatile


Speed
Spatiotemporal Database Laboratory


Sequential access : much slower than disk
Cost




Primarily Used for backup
Very high capacity (40 to 300 GB tapes available)
Tape can be removed from drive
Drives are expensive
Tape jukeboxes

hundreds of terabytes to even a petabyte
Pusan National University
em
Data Access with Secondary Memory
Get Data
Hit Ratio rh = nh / na
Access Request
Spatiotemporal Database Laboratory
How to increase hit ratio ?
If in main
memory
Get Data
Main
Memory
If not in main
memory
Load on main memory
Access to Disk
Disk
Pusan National University
em
Why Hit Ratio is so important ?

Example
Spatiotemporal Database Laboratory
for(int i=0;i<1000;i++)
Nbytes=read(fd,buf,100);
when rh = 0
1000 * 10-2 sec = 10 sec
1000 disk accesses ?
when rh = 1
1000 * 10-8 sec = 10-5 sec
Pusan National University
em
Physical Structure of Disk
200~400 sectors
Spatiotemporal Database Laboratory
512 bytes
2 * nDF
Pusan National University
em
Disk Access Time

Disk Access Time
t = tS + tR + tT , where
 tS : Seek Time

Spatiotemporal Database Laboratory



tR : Rotational Latency




Time to reposition the head over the correct track
Average seek time is 1/2 the worst case seek time
4 to 10 milliseconds on typical disks
Time to reposition the head over the correct sector
Average rotational latency : ½ r (to find index point) + ½ r = r
In case of 15000 rpm : r =1*60sec/15000 = 4 msec
tT : Transfer Time



Time to transfer data from disk to main memory via channel
Proportional to the number of sectors to read
Real transfer time is negligible
Pusan National University
em
Block-Oriented Disk Access

Example
for(int i=0;i<1000;i++)
Nbytes=read(fd,buf,10);
10 bytes
Spatiotemporal Database Laboratory
1000 times
100 times
Buffer in
main memory
10 times
Number of Disk Accesses
1 block (e.g. 1024 bytes)
1024 bytes
Pusan National University
em
Disk Block


Unit of Disk Access
Block Size

Spatiotemporal Database Laboratory


Normally multiple of sectors
1K, 4K, 16K or 64K bytes depending on configuration
Why not large block ?


Limited by the size of available main memory
Too large : unnecessary accesses of sectors

e.g. only 100 bytes, when block size is given as 64K


1 block : 128 sectors (about ½ track, ½ rotation, 2 msec)
Too wasteful
Pusan National University
em
Buffer

Buffer


Spatiotemporal Database Laboratory

Temporary memory to transfer a chunk of data
1 buffer : multiple blocks
Page


A piece of buffer (main memory) corresponding with block
Page Replacement when buffer is full
Pusan National University
em
Buffer Management : Read
Get Data
Access Request
Spatiotemporal Database Laboratory
Read Request
If in main
memory
Get Data
Main
Memory
If not in main
memory
Load on main memory
and Replacement
Access to Disk
Disk
Pusan National University
em
Buffer Management : Write
Write Data
Access Request
Spatiotemporal Database Laboratory
Write Request
If in main
memory
Write Data
Write Data on Disk
Main
Memory
If not in main
memory
Load on main memory
and Replacement
Access to Disk
Disk
Pusan National University
em
Buffer Manager : Replacement Policy

LRU


Spatiotemporal Database Laboratory


Replace the block least recently used
Most operating system and buffer management
Idea behind LRU – use past pattern of block references as a
predictor of future references
Prediction of future reference

Queries have well-defined access patterns (such as
sequential scans), and a database system can use the
information in a user’s query to predict future references
Pusan National University
em
Buffer Manager : Replacement Policy

Pinned block :


Toss-immediate strategy :
Spatiotemporal Database Laboratory


memory block that is not allowed to be written back to disk.
frees the space occupied by a block as soon as the final
tuple of that block has been processed
Most recently used (MRU) strategy :

system must pin the block currently being processed. After
the final tuple of that block has been processed, the block is
unpinned, and it becomes the most recently used block.
Pusan National University
em
Logical Structure of File
Field
Field
Field
Record (Tuple)
Fixed Size Record
Record
Spatiotemporal Database Laboratory
Record
Block
File
Variable Size Record
Pusan National University
em
Fixed Size Record

Fixed Size



Spatiotemporal Database Laboratory

Disk Address


Fixed Number of Fields, and
Fixed Size of each Field
Easy to implement
(n-1)*srecord
Deletion of a record

Like Array but no movement


Free Record List or
Pointer to Next Record
Pusan National University
em
Variable Length Record

Variable Length due to


Spatiotemporal Database Laboratory


Variable Number of Fields, or
Variable Size of each Field
Complicated to implement
Implementation



Delimiter (, size, or pointer)
Slotted Page
Fixed Length


Overflow Area
Reserved Space
Pusan National University
em
Delimiters
Spatiotemporal Database Laboratory
Record
Record
…
Delimiters
Record
Record
Record
Pointer/Size
• Difficult to handle deletions and insertions
…
Record
Pusan National University
Spatiotemporal Database Laboratory
em
Slotted Page
Pointer to Record
 Records can be moved around within a page
 to keep them contiguous with no empty space between them
 entry in the header must be updated.
 Pointers should not point directly to record
 But to the entry for the record in header.
Spatiotemporal Database Laboratory
Pusan National University
em
Reserved Space
Maximum # of Fields
Spatiotemporal Database Laboratory
Pusan National University
em
Overflow Area
First field of record
Rest records
Pusan National University
em
Binary Large Object Block (BLOB)

If size (field) > size (block)

e.g. Image or Video
Spatiotemporal Database Laboratory
Name
ID#
Photo
Block size

BLOB : Type of field where its size is greater than block size

cf. CLOB : Text rather than binary
Name
ID#
Contiguous Reserved
Block for BLOB
Pusan National University
em
File System

Example
fd=open(”data.txt”,O_RDONLY,0);
Nbytes=read(fd,buf,100);
Spatiotemporal Database Laboratory
 How to process these functions in OS ?
Pusan National University
em
i (index)–node : information about file
i-node
Spatiotemporal Database Laboratory
Attributes
Pointers to data block
Data Block
Name
Type : directory, data, special
Permission
Ownership
Last updated date/time
Created date/time
Pusan National University
em
i-node : Pointer to data block
Attributes
Data Block
Pointers to data block
(0-9: up to 40K bytes)
...
Spatiotemporal Database Laboratory
Single direct Pointer
Double direct Pointer
Data Block
Pointer Block
(1024 blocks)
Data Block
...
Data Block
Pusan National University
em
Block configuration for i-node
0
1
2
Spatiotemporal Database Laboratory
3
Boot Block
Super Block
i-node 1 ~ 40
i-node 41 ~ 80
…
Data block
Data block
…
Data block
Reserved Block
Given by formatting
User space
Pusan National University
em
Implementation of File Hierarchy
Root directory
block
i-node 1
Spatiotemporal Database Laboratory
Attributes
i-node for
root directory
1
1
4
7
14
9
6
8
.
..
bin
dev
lib
etc
usr
tmp
Data block
for /usr/lik/data.txt
i-node 6
Attributes
i-node for
/usr
i-node 107
Attributes
i-node for
/usr/lik/data.txt
Directory block
for /usr
6
1
19
30
54
.
..
lik
kimmk
parksh
Directory block
for /usr/lik
19 .
6 ..
107 data.txt
i-node 19
Attributes
i-node for
/usr/lik
Pusan National University
em
FAT (File Allocation Table)

Spatiotemporal Database Laboratory

DOS or MS-Windows 98
Same purpose of i-node in UNIX
Pusan National University
em
fd=open(”data.txt”,O_RDONLY,0);
Nbytes=read(fd,buf,100);

Step 1 : Find i-node for “data.txt” via i-node


Spatiotemporal Database Laboratory

Step 2 : Check owner and access right
Step 3 : Register it to OpenFileTable





from root or current directory
Initialize entry values : e.g. offset, mode
fd : array index of this table
Some entries : reserved for stdio, stderr, etc..
Step 4 : Check ownership and right
Step 5 : Read 100 bytes to buf


open
Read 100 bytes from the OpenFileTable[fd].offset
OpenFileTable[fd].offset += 100;
write
Pusan National University
em
Data Dictionary : What does it contain ?

Data dictionary (also called system catalog) stores metadata

Information about relations


Spatiotemporal Database Laboratory



User and accounting information, including passwords
Statistical and descriptive data


names of relations
names and types of attributes of each relation
names and definitions of views
number of tuples in each relation
Physical file organization information


How relation is stored (sequential/hash/…)
Physical location of relation



operating system file name or
disk addresses of blocks containing records of the relation
Information about indices
Pusan National University
em
Data Dictionary : How to Represent it

Data structure


Spatiotemporal Database Laboratory

specialized data structures designed for efficient access
a set of relations, with existing system features used to
ensure efficient access
The latter alternative is usually preferred
Relation-metadata (relation-name, number-of-attributes, storage-organization, location)
Attribute-metadata (attribute-name, relation-name, domain-type, position, length)
User-metadata (user-name, encrypted-password, group)
Index-metadata (index-name, relation-name, index-type, index-attributes)
View-metadata (view-name, definition)
Pusan National University
em
Persistent Object

Objects in C++ program


Persistent Object

Spatiotemporal Database Laboratory
Volatile Object : Disappears with the termination of program


Non-Volatile Object : Keeps its status despite of its
termination
A Necessary Condition for Object-Oriented Databases
Object vs. Record
Pusan National University
em
OID : Object Identifier

ID given by system


Spatiotemporal Database Laboratory

the only way to identify object
one ID per an object
Logical OID vs. Physical OID

Logical OID



No direct specification from OID to physical location
Need an index that maps an OID to the object’s actual location.
Physical OID


encodes physical location of the object
Physical OIDs typically have the following parts:



a volume or file identifier
a page identifier within the volume or file
an offset within the page
Pusan National University
em
Pointer Swizzling
Pointer
Spatiotemporal Database Laboratory
Object
Pointer Swizzling
OID
Object
Main Memory
Disk Space