Transcript ppt

CS 15-447: Computer Architecture
Lecture 24
Disk IO and RAID
November 12, 2007
Nael Abu-Ghazaleh
[email protected]
http://www.qatar.cmu.edu/~msakr/15447-f08
15-447 Computer Architecture
Fall 2008 ©
Interfacing Processor with peripherals
Processor
L1 cache
Instrs.
L1 cache
data
L2 Cache
Front side bus, aka
system bus
bus interface
memory bus
main
memory
I/O
bridge
To I/O
15-447 Computer Architecture
Fall 2008 ©
Another view
15-447 Computer Architecture
Fall 2008 ©
Disk Access
• Seek: position head
over the proper track
(5 to 15 ms. avg.)
• Rotate: wait for desired
sector
(.5 / RPM). RPM 5400—
15,000 currently
• Transfer: get the data
(30-100Mbytes/sec)
15-447 Computer Architecture
Platters
Tracks
Platter
Sectors
Track
Fall 2008 ©
Manufacturing Advantages of
Disk Arrays
Disk Product Families
Conventional:
4 disk
designs
3.5” 5.25”
10”
Low End
14”
High End
Disk Array:
1 disk design
3.5”
15-447 Computer Architecture
Fall 2008 ©
RAID: Redundant Array of Inexpensive Disks
•
•
•
•
•
•
•
RAID 0: Striping (misnomer: non-redundant)
RAID 1: Mirroring
RAID 2: Striping + Error Correction
RAID 3: Bit striping + Parity Disk
RAID 4: Block striping + Parity Disk
RAID 5: Block striping + Distributed Parity
RAID 6: multiple parity checks
15-447 Computer Architecture
Fall 2008 ©
Non-Redundant Array
• Striped: write sequential blocks
across disk array
• High performance
• Poor reliability:
MTTFArray = MTTFDisk / N
MTTFDisk = 50,000 hours (6 years)
N = 70 Disks
MTTFArray= 700 hours (1 month)
Odd
Even
Blocks Blocks
15-447 Computer Architecture
Fall 2008 ©
Redundant Arrays of Disks
• Files are "striped" across multiple
spindles
• Redundancy yields high data availability
• When disks fail, contents are
reconstructed from data redundantly
stored in the array
• High reliability comes at a cost:
– Reduced storage capacity
– Lower performance
15-447 Computer Architecture
Fall 2008 ©
RAID 1: Mirroring
• Each disk is fully duplicated onto its
“shadow”  very high availability
• Bandwidth sacrifice on writes:
Logical write = two physical writes
• Reads may be optimized
• Most expensive solution: 100%
capacity overhead
Used in high I/O rate , high availability environments
15-447 Computer Architecture
Fall 2008 ©
RAID 3: bit striping + parity
• A parity bit for every
bit in the striped data
• Parity is relatively
easy to compute
• How does it perform
for small
reads/writes?
15-447 Computer Architecture
Fall 2008 ©
Redundant Arrays of Disks
RAID 3: Parity Disk
10010011
11001101
10010011
...
logical record
Striped physical
records
P
1
0
0
1
0
0
1
1
1
1
0
0
1
1
0
1
1
0
0
1
0
0
1
1
0
0
1
1
0
0
0
0
• Parity computed across recovery group to protect against
hard disk failures
33% capacity cost for parity in this configuration
wider arrays reduce capacity costs, decrease expected availability,
increase reconstruction time
• Arms logically synchronized, spindles rotationally synchronized
logically a single high capacity, high transfer rate disk
Targeted for high bandwidth applications: Scientific, Image Processing
15-447 Computer Architecture
Fall 2008 ©
RAID 4 (Block interleaved parity)
15-447 Computer Architecture
Fall 2008 ©
Redundant Arrays of Disks RAID 5+:
High I/O Rate Parity
A logical write
becomes four
physical I/Os
Independent writes
possible because of
interleaved parity
Reed-Solomon
Codes ("Q") for
protection during
reconstruction
Targeted for mixed
applications
D0
D1
D2
D3
P
D4
D5
D6
P
D7
D8
D9
P
D10
D11
D12
P
D13
D14
D15
Increasing
Logical
Disk
Addresses
Stripe
P
D16
D17
D18
D19
D20
D21
D22
D23
.
.
.
.
.
.
P
.
.
.
.
.
.
.
Disk Columns
.
.
15-447 Computer Architecture
Stripe
Unit
Fall 2008 ©
Nested RAID levels
• RAID 01 and 10 combine mirroring and
striping
– Combine high performance (striping) and
reliability (mirroring)
– Get reliability without having to compute
parities: higher performance and less
complex controller
• RAID 05 and 50 (also called 53)
15-447 Computer Architecture
Fall 2008 ©
Operating System can help (1)
Reducing access time
• Disk defragmentation: why does that
work?
• Disk scheduling: operating system can
reorder requests
– How does it work? Reduce seek time
• Example: Mean seek distance first,
Elevator algorithm, Typewriter algorithm
– Lets do an example
• Log structured file systems
15-447 Computer Architecture
Fall 2008 ©
Log structured file systems
• Idea: most reads to disk are serviced
from cache – locality!
• But what about writes?  they have to go
to disk; if system crashes, we the file
system is compromised
• How can we make updates perform
better:
– Save them in a log (sequentially) instead of
their original location; why does that help?
– Tricky to manage
15-447 Computer Architecture
Fall 2008 ©
Operating System can help (2)
Reliability
• RAIDs are reliable to disk failures, not
CPU failures/software bugs
– If the cpu writes corrupt data to all redundant
disks, what can we do?
• Backups
• Reliability in the operating system
15-447 Computer Architecture
Fall 2008 ©
How are files allocated on disk?
• Index block, has
pointers to the other
blocks in the file
• Alternatives: linked
allocation
• Data and meta data
both stored on disk
• What do we do for
bigger files?
15-447 Computer Architecture
Fall 2008 ©
Unix Inodes
15-447 Computer Architecture
Fall 2008 ©
Disk reliability
• Any update to disk, changes both data
and meta data
– requires several writes
• Operating system may reorder them as
we saw
• What happens if there is a crash?
– Lets look at examples
• Solution: journaling file system
– Update journal before updating filesystem
15-447 Computer Architecture
Fall 2008 ©
Flash Memory
• Emerging technology for non-volatile storage – competitor
to hard disks, especially for embedded market
– Can be used as cache for the disk (much larger than RAM
disks for the same price, and persistent)
•
Floating gate transistors: semi-conductor technology (like
microprocessors and memory) – we know how to build
them big (or small!) and cheap
– Faster, lower power than disk drives
– ...but still more expensive, and has some limitations
• Two types of flash memory: NAND and NOR
15-447 Computer Architecture
Fall 2008 ©
NOR Flash
• NOR accessed like regular memory and
has faster read time
– Used for executables/firmware that dont need
to change often (PDAs, cellphones, etc..
code)
– Can be executed in place
• bad write/erase performance (2 seconds
to erase a block!)
• bad wear properties (100,000 writes
average lifetime)
15-447 Computer Architecture
Fall 2008 ©
NAND Flash
• Accessed like a block device (like a disk
drive)
– Higher density, lower cost
• Faster write/erase time; longer write life
expectancy
• Well suited for cameras, mp3 players,
USB drives...
• Less reliable than NOR (requires error
correction codes)
15-447 Computer Architecture
Fall 2008 ©
Different properties from Disks
• Flash memory has quite different properties
from disks – Emphasis on seek time gone
• Needs to erase a segment before writing (small
writes are expensive!)
– Slow...(especially NOR erase/write and NAND random
access reads)
– Must be done in large segments (10s of KBytes)
– Can only be rewritten a limited number of times
15-447 Computer Architecture
Fall 2008 ©
Summary of Flash circa. 2006
15-447 Computer Architecture
Fall 2008 ©