Databases Used In Facebook

Download Report

Transcript Databases Used In Facebook

By: Chris Hayes
Facebook
 Today, Facebook is the most commonly used social
networking site for people to connect with one another
online.
 People of all ages use Facebook, and the uses for the site are
staying connected with friends and family as well as meet
new people with the addition of uploading pictures, videos
and statuses to let people know how you are doing.
Databases
 A database is a structured set of data held in a computer,
one that is accessible in various ways and it manages data
to allow fast storage and retrieval of that data.
 There are many different databases within Facebook that
make all its unique features possible.
 There are close to half a billion active users, five
hundred-seventy page views monthly, three billion
photos uploaded per month, more than 25 billion pieces
of content such as status updates, comments, etc. and
over 30,000 servers.
 Facebook uses databases such as MySQL, Memcached,
Haystack, Cassandra, Hadoop and Hive, and scribe that
keep the site up and running smoothly.
MySQL
 MySQL is an open source relational database management system.
 Facebook uses MySQL primarily as a key-value store in which its
data is randomly distributed across a large set of logical
occurrences.
 The MySQL database is also used for structured data storages for
wall posts, statuses, user information and so on.
Memcached
 Memcached is a popular software within the internet and it is
known as a distributed memory chasing system.
 Facebook uses Memcached as a caching layer between the
web servers and MySQL servers.
 Facebook runs thousands of Memcached servers with tens of
terabytes of cached data at any one point in time.
Haystack
 Haystack is a high performance storage and retrieval system for
photos used by Facebook.
 Haystack is an object store so it does not necessarily mean that it
has to store any photos.
 Haystack is one of the busiest databases used in Facebook because
There are typically twenty billion photos uploaded to Facebook
that are stored in four different resolutions resulting in the
production of eighty billion photos which Haystack has to work
on.
Cassandra
 No, Cassandra is the not the hot blonde that uses her phone
during class while you creepily stare at her the whole time.
 It is a distributed storage system that tends to have no point
of failure.
 It has been made an open source database that Facebook
uses for its Inbox search.
Hadoop and Hive
 Hadoop and Hive are one database but they are separate in terms
of their functions.
 Hadoop is an open source map-reduce implementation that
makes it possible for websites to perform calculations on massive
amounts of data.
 Facebook uses Hadoop for data analysis which is important to
Facebook’s duties because Facebook controls massive amounts of
data.
 Hive actually originated from Facebook and is used by
Facebook to make it possible to use SQL queries against
Hadoop, making Facebook easier for its active users to use.
Scribe
 Scribe is a flexible logging system that Facebook uses for a
multitude of purposes internally.
 It’s been built to be able to handle logging at the scale of
Facebook, and automatically handles new logging categories
as they show up.
Wrap-Up
 It is databases like these that keeps it up and running
smoothly making it possible for Facebook to have up to half a
billion active users. It is a lot of data entering Facebook from
all these users, so Facebook has to make sure their databases
are functioning properly in order to fulfill daily functions for
the site.
THANK YOU