PNUTS: Yahoo!`s Hosted Data Serving Platform
Download
Report
Transcript PNUTS: Yahoo!`s Hosted Data Serving Platform
Alireza Angabini
Advanced DB class
Dr. M.Rahgozar
Fall 88
Introduction
PNUTS Overview
Functionality
Architecture
Applications
Experimental Results
Conclusion
A.Angabini - PNUTS
2
Main
requirements of Web Apps
Scalability
Response Time and Geographic Scope
High Availability & Fault Tolerance
Relaxed Consistency Guarantees
A.Angabini - PNUTS
3
PNUTS
is
Massively parallel
Geographically distributed database system
Designed Yahoo!
Used by their web application
Shared between several applications
A.Angabini - PNUTS
4
Data
Model & Features
Fault
Tolerance
Pub-Sub
Message System
Hosting
A.Angabini - PNUTS
5
Data & Query Model
Consistency Model
A.Angabini - PNUTS
6
Simplified relational data model
Organizes data into tables of records with attributes
Allows arbitrary structure inside a record – “blob”
Schema are flexible
New attribute is added without halting query or update
activity
Allow to have empty attribute in the record
Query language
Supports selection and projection in single table
Updates & deletes with primary key only
A.Angabini - PNUTS
7
Hide the complexity of replication
Considered between general serializability & eventual
consistency
Per-record timeline consistency
“All replica of given record apply all updates to the record in
the same order”
A.Angabini - PNUTS
8
Support
range of API calls with different
levels of consistency
Read-any
Read-critical(required_version)
Read-latest
Write
Test-and-set-write(required_version)
A.Angabini - PNUTS
9
Data
tables are horizontally partitioned into
groups of records called tablets
A.Angabini - PNUTS
10
Ordered
table
Primary-key space of a table is divided into intervals
Each interval corresponds to one tablet
The router stores interval mapping
For a given PMK, binary search is used to find the
tablet
A.Angabini - PNUTS
11
Hash-organized
table
n-bit hash function H(), 0 ≤ H() < 2n
[0... 2n) is divided into intervals
Each interval corresponds to single tablet
To map a key to a tablet,
1.
2.
Hash the key
Search set of interval using binary search
A.Angabini - PNUTS
12
The
system uses asynchronous replication
To ensure low-latency updates
Yahoo!
Message Broker (YMB)
Used for replication & logging because:
1. Multiple steps are applied before committed to DB
2. YMB is designed for wide-area replication
A.Angabini - PNUTS
13
Recovery
from failure (3 Steps)
1. the
3.
2.
“checkpoint
the source
tablet controller
tablet
message”
is copied
requests
is published
to the
a copy
destination
tofrom
YMBtheregion
source tablet
A.Angabini - PNUTS
14
User
Database
Social
Applications
Content
Meta-Data
Listings
Management
Session
Data
A.Angabini - PNUTS
15
Three PNUTS regions
2 west coast, 1 east coast
5 storage units, 2 message brokers, 1 router
West: Dual 2.8 GHz Xeon, 4GB RAM, 6 disk RAID 5 array
East: Quad 2.13 GHz Xeon, 4GB RAM, 1 SATA disk
Workload
1200-3600 requests/second
0-50% writes
80% locality
Storage engine for hash table
“Yahoo! propriety disk-based hashtable”
Storage engine for ordered tables
MySQL using InnoDB
A.Angabini - PNUTS
16
The coming experiments show
The impact of several factors on the average
latency for request
A.Angabini - PNUTS
17
A.Angabini - PNUTS
18
A.Angabini - PNUTS
19
A.Angabini - PNUTS
20
A.Angabini - PNUTS
21
A.Angabini - PNUTS
22
Rich database functionality and low latency
at massive scale.
Tradeoffs between functionality,
performance and scalability.
Choose asynchronous replication to ensure
low write latency.
Delivers the data management as hosted
service
A.Angabini - PNUTS
23
B. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.
Jacobsen, N. Puz, D. Weaver, and R. Yerneni, "PNUTS: Yahoo!'s hosted
data serving platform," Proceedings of the VLDB Endowment archive, vol.
1, 2008, p. 1277–1288.
Technical report, Raghu Ramakrishnan, Yahoo! Research and Platform
Engineering Team
A.Angabini - PNUTS
24
Thanks For Your Attention
?
A.Angabini - PNUTS
25