Transcript PPT
The Google File System
Presentation by: Eric Frohnhoefer
CS5204 – Operating Systems
1
Google File System
Assumptions
Built from inexpensive commodity components
Modest number of large files
Cheap components frequently fail
Few million files, each 100 MB or larger
Support for large streaming reads and small
random reads
Files written once then appended
High sustained bandwidth favored over low
latency
CS5204 – Operating Systems
2
Google File System
Design Decisions
Single master, multiple chunkservers
File structure
Familiar interface
Fixed size 64MB chunks
Chunk divvied into 64K blocks
32 bit checksum computer for each block
Each chunk replicated across 3+ chunkservers
Create, delete, open, close, read, and write
Snapshot and record append
No caching
CS5204 – Operating Systems
3
Google File System
Architecture
Single Master
Manages namespace and locking
Manages chunk placement, creation, re-replication,
and rebalancing
Garbage collection
CS5204 – Operating Systems
4
Google File System
Architecture
Chunkserver
Servers chunks to directly to client
Stores 64 MB chunks and checksums for each 64K
block
Reports chunks contained on server to master
Verifies contents during idle periods
CS5204 – Operating Systems
5
Google File System
Metadata
Namespace
Metadata stored in memory
Logical mapping from files to locations on
chunkserver
Kept up to date with heartbeat messages from
chunkserver
Quick access
64 bytes of metadata for each 64 MB chunk
Operations log
Historical record of changes made to metadata
Dennis Kafura – CS5204 – Operating Systems
6
Google File System
Consistency Model
States:
Consistent – all replicas have the same value
Defined – replica reflects the mutation
Namespace mutations are atomic and serializable
Client requires additional logic
Remove inconsistent records
Remove repeat records
Add checksums and unique identifies to records
CS5204 – Operating Systems
7
Google File System
Mutation Operation
Write operation:
1.
2.
3.
4.
5.
6.
7.
Client requests location primary and
secondary chunkserver.
Master assigns primary chunkserver
and replies to client.
Client pushes all data to replicas. Data
stored in LRU buffer.
Client sends write request to primary
chunkserver.
Primary assigns serial number and
forwards request to all secondary
chunkservers.
Secondary servers reply to primary with
operation status.
Primary replies to client with
operations status.
CS5204 – Operating Systems
8
Google File System
Mutation Operation
Atomic record append:
Similar to O_APPEND mode in Unix
without race condition due to multiple
writers.
Record written at least once.
Same logic flow as write except
primary appends the record and tells
secondary chunkservers the exact
location.
Used heavily by Google applications.
CS5204 – Operating Systems
9
Google File System
Mutation Operation
Snapshot operation:
1.
2.
3.
4.
Master receives snapshot request and
revokes outstanding leases.
After leases revoked the master logs
the operation.
In-memory copy of file or directory
metadata created.
Copy created on same chunkserver only
when chunk is mutated.
CS5204 – Operating Systems
10
Google File System
Master’s Responsibilities
Namespace management
Each entry has a associated read-write lock
Allows for concurrent mutations in same directory
/home/user
/save/user
Snapshot:
1.
2.
Read lock acquired on /home and /save
Write lock acquired on /save/user and /home/user
CS5204 – Operating Systems
11
Google File System
Master’s Responsibilities
Periodic communications with chunkservers
Replica placement
Collect state, tracks cluster health
Maximize reliability and maximize bandwidth
utilization
Distribute chunks between multiple racks
Chunk Creation
New replicas on chunkservers with below-average
disk space utilization
Limit number of recent creations on chunkserver
Replicate across racks
CS5204 – Operating Systems
12
Google File System
Master’s Responsibilities
Re-replication
Rebalance
Occurs when number of replicas falls below userspecified goal
Re-replication is prioritized
Master examines the current replica distribution
and moves replicas for better disk space and load
balancing.
Garbage collection
Master logs deletion immediately
File is renamed a given a deletion timestamp
Files actually deleted later at user-specified date
CS5204 – Operating Systems
13
Google File System
High Availability
Fast recovery
Chunk replication
Default 3 replicas
Distribute across multiple racks
Shadow Master
Master state is fully replicated.
Mutations only committed once log has been
written on all replicas.
Provides read-only access even when master is
down
Dennis Kafura – CS5204 – Operating Systems
14
Google File System
Performance
Cluster characteristics
Cluster performance
CS5204 – Operating Systems
15
Google File System
Amazon S3
RESTful and SOAP style interface
BitTorrent for distributed download
99.999999999% durability and 99.99% uptime
Cost
Replicated 3 times across 2 datacenters
Storage: $0.14 / GB / Month
Bandwidth: $0.10 / GB
Requests: $0.01 / 1000 Requests
Permissions controlled by Access Control List
(ACL)
CS5204 – Operating Systems
16
Google File System
Conclusions
Simple solution
Seamlessly handles hardware failures
Purpose built to Google’s needs
Large files
High read throughput
Record appends
Dennis Kafura – CS5204 – Operating Systems
17
Google File System
Reference
Cluster Computing and MapReduce Lecture 3
http://www.youtube.com/watch?v=5Eib_H_zCEY
http://courses.cs.vt.edu/cs5204/fall10-kafuraNVC/Papers/FileSystems/GoogleFileSystem.pdf
http://communication.howstuffworks.com/googlefile-system.htm
Dennis Kafura – CS5204 – Operating Systems
18