Transcript Storage

Storage and File Structure
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
DBMS and Storage/File Structure

Why do we need to know about storage/file structure


Many database technologies are developed to utilize the
storage architecture/hierarchy
Data in the database needs to be organized and
stored/retrieved efficiently
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Storage Hierarchy
Volatile
primary storage
Secondary
storage
Cache
Memory
unit price
Flash Memory
Magnetic Disk
Non-volatile
Tertiary
storage
Optical Disk
Magnetic Tape
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
speed
Primary Storage (Volatile)

Cache


Speed: 7 to 20 ns (1 nanosecond = 10–9 seconds)
Capacity:



A typical PC level 2 cache 64KB-2 MB.
Within processors, level 1 cache usually ranges in size from 8
KB to 64 KB.
Main memory:


Speed: 10s to 100s of nanoseconds;
Capacity: Up to a few Gigabytes widely used currently

1/14/2005
per-byte costs have decreased roughly factor of 2 every 2 to 3
years)
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Secondary Storage (Non-volatile)

Flash memory

Speed:





read speed similar to main memory, write is much slower
Capacity: 32M to 512M currently
Forms: SmartMedia, memory stick, secure digital, BIOS
Cost: roughly same as main memory
Magnetic-disk

Capacities: up to roughly 100 GB currently

1/14/2005
Growing constantly and rapidly with technology
improvements (factor of 2 to 3 every 2 years)
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Tertiary Storage (Non-volatile)

Optical storage




CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular
forms
CD-RW, DVD-RW, and DVD-RAM
Reads and writes are slower than with magnetic disk
Juke-box systems, with large numbers of removable disks, a
few drives, and a mechanism for automatic loading/unloading
of disks available for storing large volumes of data
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Tertiary Storage (Non-volatile)

Tape storage




non-volatile, used primarily for backup (to recover from disk
failure), and for archival data
sequential-access – much slower than disk
very high capacity (40 to 300 GB tapes available)
Tape jukeboxes available for storing massive amounts of data

1/14/2005
hundreds of terabytes (1 terabyte = 109 bytes) to even a petabyte (1
petabyte = 1012 bytes)
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Between Memory and Disk



The permanent residency of database is mostly on disk
In database, cost is usually measured by the number of disk I/O
But disks are too slow and we need memory to be the buffers …
but memory is volatile

this introduced a number of issues
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Lets talk about disk
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Disk Subsystem
Disk interface standards families
• ATA (AT adaptor) range of standards
• SCSI (Small Computer System Interconnect) range of
standards
• Several variants of each standard (different speeds and
capabilities)
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Disk Speed
Discuss ways to improve
disk reading speed
Rotation time/latency
milliseconds
Data-transfer rate
(4-8MB/sec)
Typical numbers:
 16,000 tracks per platter
 sectors per track: 200 – 400
 512 bytes per sector
Seek time
(milliseconds)
Access time = seek time + latency
1/14/2005
 4-16KB per block
 5,400 - 15,000 r p m
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
RAID

RAID: Redundant Arrays of Independent Disks


high capacity and high speed by using multiple disks in
parallel, and
high reliability by storing data redundantly
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Mean time to failure (MTTF)

Average time the disk is expected to run continuously without
any failure.


Typically 3 to 5 years (1 year = 8,760 hours)
MTTF = 30,000 to 1,200,000 hours for a new disk


an MTTF of 1,200,000 hours for a new disk means that given 1000
relatively new disks, on an average one will fail every 1200 hours
When number of disks increase, the chance of some disk failure
increase proportionally
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Parallelism

Two main goals of parallelism in a disk system:
1. Load balance multiple small accesses to increase throughput
2. Parallelize large accesses to reduce response time.

Basic strategy: Stripping

Compare and contrast bit stripping and byte stripping
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Redundancy


store extra information that can be used to rebuild information lost
in a disk failure
Basic strategy: mirroring, parity
Data
10010010

Parity
1
Mean time to data loss depends on mean time to failure,
and mean time to repair

E.g. MTTF of 100,000 hours, mean time to repair of 10 hours gives
mean time to data loss of 500*106 hours (or 57,000 years) for a
mirrored pair of disks (ignoring dependent failure modes)
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
RAID levels
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Choice of RAID Levels

Level 1 provides much better write performance than level 5



Level 5 requires at least 2 block reads and 2 block writes to write a
single block, whereas Level 1 only requires 2 block writes
Level 1 preferred for high update environments such as log disks
Level 1 had higher storage cost than level 5



disk drive capacities increasing rapidly (50%/year) whereas disk
access times have decreased much less (x 3 in 10 years)
I/O requirements have increased greatly, e.g. for Web servers
When enough disks have been bought to satisfy required rate of I/O,
they often have spare storage capacity



so there is often no extra monetary cost for Level 1!
Level 5 is preferred for applications with low update rate,
and large amounts of data
Level 1 is preferred for all other applications
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Buffer Management


Database can not fit entirely in memory, needs
memory as a buffer for speed reasons
LRU is used in many OS


Spatial and temporal locality due to loops
Database has a more predictable behavior

Example: join
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
DBMS Buffer Management Strategies




Pinned block – memory block that is not allowed to be written
back to disk.
Toss-immediate strategy – frees the space occupied by a block
as soon as the final tuple of that block has been processed
Most recently used (MRU) strategy – system must pin the block
currently being processed. After the final tuple of that block has
been processed, the block is unpinned, and it becomes the most
recently used block.
Statistical information based


E.g., the data dictionary is frequently accessed. Heuristic: keep
data-dictionary blocks in main memory buffer
Buffer managers also support forced output of blocks for the
purpose of recovery
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure
Exercises
1/14/2005
Yan Huang - CSCI5330 Database
Implementation – Storage and File Structure