Comment - Meetup
Download
Report
Transcript Comment - Meetup
Search with a KeyValue Store
Intro to NoSQL
• Key-value store
• Schemaless
• Distributed
• Eventually Consistent
Key-Value
• Single unique key for each value in the
database
• Extremely fast look-up
• Easy distribution (no such thing as
joins)
Schemaless
• Critical for extremely large data sets
• No alter table commands, each value
has no pre-defined fields
Distributed
• Data set is designed to be shared
across multiple machines
• Typically makes use of commodity
servers with enough RAM to keep the
entire data set in memory
Eventually Consistent
• Replica nodes are not notified of
changes before a success response is
returned to the client
• Makes NoSQL problematic for highly
sensitive transactions (finance, etc)
Database Design in
NoSQL
• Denormalization is your friend
• Think of collections as views on a data
set that
A News Site Using SQL
Users
Comment
Stories
id
id
id
user_name
story_id
date
birthday
user_id
headline
content
content
Loading a Story with SQL
SELECT * FROM stories
SELECT * FROM comments
LEFT JOIN users ON users.id = comments.user_id
LEFT JOIN comments children ON children.parent_id = comments.id
WHERE story_id = x
Redesigned in a NoSQL
Data Store
Story #dgi3ck
date
headline
content
comments
Comment #la529
content
username
user_image_url
user_id
children
Comment #5bg26
content
username
user_image_url
user_id
children
Comment #mn34i
content
username
user_image_url
user_id
Loading a Story with NoSQL
Stories::get(dgi3ck)
Some Design
Considerations
• What is the context in which we will
access this data?
• What data do we need to access
outside the of this context?
• How often does the data change?
Embedded Data
• NoSQL can support foreign keys
• Some data is more appropriately stored
“embedded” in a parent context
• E.g. Comments are rarely (if ever)
accessed outside of their parent Story
Cached Data
• Data from an object that needs to be
accessed outside of the current context
can be cached
• Keep in mind that it may need to be
updated
• E.g. a user changes his username,
Comments can be updated
Several common
NoSQL Stores
• Memcached
• BigTable
• SimpleDB
• MongoDB
Why we chose
MongoDB
• Auto-sharding and easy setup for
distribution
• JavaScript API
• Powerful indexing capabilities
MongoDB Libraries
• ORM: mongo_mapper
• https://github.com/jnunemaker/mongom
apper
• Underlying Connection: mongo
• https://github.com/mongodb/mongoruby-driver
• BSON support: bson_ext
• http://rubygems.org/gems/bson_ext
•
•
Lifebooker’s Availability
Search
Searches across Services
Filters
•
•
•
•
•
•
Time/Date
Geographical Zone
Service Category
Practitioner Gender
Concurrent Availability
(and several more)
Services, Discounts
and Practitioners
• Services are offered by Providers
• Providers have Practitioners
(Employees)
• Discounts are applied to Providers for a
Service in a given time
Modeling this Data in MongoDB
Embedding with
MongoMapper
Indexing and Searching
• Mongo offers powerful indexing
capabilities
• Arrays are “first-class citizens”
• Complex indices allow for great
performance
Creating Meta-Data
• With complex data structures, creating
meta-data before_save will allow you to
make that data easily searchable
• E.g. the maximum discount on a given
day for a service
Creating Indices
Querying
• Uses DataMapper/Arel Syntax
• Chains conditions, ordering and offset
Filtering Complex
Data Structures
• MongoDB offers a JavaScript API for
MapReduce
• Map - transform and filter data
• Reduce - combine multiple rows into a
single record
A Simple Use-Case
Using MapReduce to
Filter
Filter
The Results
• Scheduled to go live within 2 weeks
• With sharding/distribution, tests show
almost no dip in response time with
more than 10x the current data set
• 20x faster than MySQL implementation
• 100ms vs 2000ms (or more)