Amazon’s Dynamo - Northwestern University
Download
Report
Transcript Amazon’s Dynamo - Northwestern University
Amazon’s Dynamo
Simple Cloud Storage
Foundations
• 1970 – E.F. Codd “A Relational Model of Data
for Large Shared Data Banks”
– Idea of tabular data
– SQL Foundations
• Codd’s 12 rules
– How database structured and what is available to
user
– Application not dependent on physical or logical
levels of database
– Insert, Update, Delete operators
Foundations, continued
• First Relational database
management systems
(RDBMS)
– Oracle in 1979, first SQL
based system
– Microsoft SQL server, etc
– Open source software
would follow later (mySQL)
• Follows Codd’s ideas
– Complexity on the server
side, let the query do all the
work
– Very stringent requirements
Drawbacks
• Licensing fees on a per processor rate
– High end Oracle system is mind-numbingly expensive
• Load Distribution requires specific nodes to
handle
– Some servers have specific roles, failure point in
network
• Complexity on servers creates difficultly with
maintenance, upgrades
– Upgrades often all at once as result, not incremental
A New Direction
• Simplify services that the database provides
– Easier scaling and error handling
• Take a more pragmatic approach
– Tailor system to sacrifice some aspects of the
traditional RDBMS to gain performance in others
– Systems less general, specific end requirements in
mind when creating
Examples
• Amazon Dynamo
– Simple primary key
– Highly available, end user
based model
– Low cost virtualized nodes
• Facebook Cassandra
– Similar goals to Amazon’s
Dynamo
– Highly avaiable, incremental
scalablilty
• Google File System
– Master node
– Data distributed across low
cost nodes
Dynamo Goals
• Scale – adding systems to network causes minimal
impact
• Symmetry – No special roles, all features in all nodes
• Decentralization – No Master node(s)
• Highly Available – Focus on end user experience
• SPEED – A system can only be as fast as the lowest
level
• Service Level Agreements – System can be adapted to
an application’s specific needs, allows flexibility
Dynamo Assumptions
• Query Model – Simple interface exposed to application level
– Get(), Put()
– No Delete()
• Atomicity, Consistency, Isolation, Durability
– Operations either succeed or fail, no middle ground
– System will be eventually consistent, no sacrifice of availability to
assure consistency
– Conflicts can occur while updates propagate through system
– System can still function while entire sections of network are down
• Efficiency – Measure system by the 99.9th percentile
– Important with millions of users, 0.1% can be in the 10,000s
• Non Hostile Environment - No need to authenticate query, speed
boost
Wanted Results
• Deliver requests in a
bounded time
• Always writable
– Highly available to users
• No dedicated roles
• Work split between
nodes fairly
Techniques
Partitioning
• Consistent Hashing
– Changing the number of slots in hash table results
in only a small number of keys to remap
– More info
• A ring of virtual nodes
– Node responsible for region between it and its
predecessor
Virtual Node
• Physical Machine has #
of virtual nodes based
on performance
• Can adapt load more
easily if a machine goes
down
• Likewise, assign nodes
to a new machine in
network
Replication
• Application provided parameter N
• Replication on different physical nodes
– Data still available if nodes go down
– Makes part of preference list for query
Versioning and Vector Clocks
• Updates propagate
asynchronously, need a
way of distinguishing
conflicts
– Possible reason for
absence of Delete()
• Vector Clock
– List of (node, counter)
– Limited size, limit
overhead for data
– If all fields are less than
or equal, first can be
updated by second
Sloppy Quorum and Hinted
Handoff
• W and R parameter set min # of nodes in a
read or write
• Read and write on the first N healthy nodes,
no strict membership, can vary over time
• Hint in metadata for intended node, will
update once that node is again available
• Allows for temporary failure in nodes or entire
networks
Synchronization and Gossip
• Merkle Trees - Info
• Use common key values between two nodes
– Traverse tree and check vector clocks to see if updates
are needed
– Exchange information on most current version of the
data if inconsistencies are found
• Gossip
– Nodes select neighbors at random and reconcile
membership change histories
• Use seed nodes to initialize
• Detect failures
Routing get() and put()
• Two Techniques
– Route request through a load balancer
• Slower
• Simpler application level code
– Partition aware client, route directly to appropriate nodes
• Faster
• More complicated application level
• First node routed to is “coordinator” node
– Generates vector clock for put and gives data to N highest
healthy nodes
– Queries N highest nodes for all versions, returned all
versions found
Implementation
• Java based
– Hardware independent, JVM
• Allows different back-end systems to be used,
based on size of data needed to be stored
– Berkeley Database Transactional Data Store
– BDB Java Edition
– MySQL, can handle large objects
• Coordinator node is a state machine for
read/writes for client
– Coordinator for a write determined by fastest read
Flexibility
• Changing W, R, N
– Business logic specific reconciliation
• Data replicated over nodes
• Application level reconciliation fro conflicting objects
– Timestamp Reconciliation
• Similar to above, last write wins
– High performance read engine
• By setting R = 1, W = N
• Reads fast and numerous, few updates
Observed Results - Speed
Observed Results – Load
Balancing
• Higher traffic causes
load to be balanced
more evenly
– Requests of popular keys
let system to balance
more easily
• In lower traffic, less
important to balance
load
Observed Results - Coordination
• Client coordination can
provide a speed boost
• Read and write latency
nearly identical
• Results as expected
Observed Results - Versions
• Measured over 24 hour period for shopping
cart
– 99.94% of users saw 1 version
– 0.00057% saw 2 versions
– 0.00047% saw 3 versions
– 0.00009% saw 4 versions
• Increase caused by increase in number of
concurrent writers, most likely
Conclusions
• Dynamo allows Amazon’s customers to have a
consistent experience even in face of server
and network errors
• Gives a scalable solution with millions of data
points to be queried quickly and efficiently
• Offloads complexity to the application to
provide a simple, flexible, and fast server-side
implementation
Thanks for listening!