Transcript Slides

A Social blog using
MongoDB
ITEC-810 Final Presentation
Lucero Soria - 42403871
Supervisor: Dr. Jian Yang
Agenda
• Introduction
• Methodology
• Outcomes
• Blog implementation
• MongoDB vs. Relational databases
• Conclusions
2
Agenda
• Introduction
• Methodology
• Outcomes
• Blog implementation
• MongoDB vs. Relational databases
• Conclusions
3
Problem Specification
Relational Databases Management Systems (RDBMS), such as
MySQL, do not provide the flexibility and scalability needed to
manage social media data
NoSQL databases, such as MongoDB, emerged to provide the
features that modern applications demand such as flexibility,
scalability and productivity
4
Project Aim
Analyse the differences between MongoDB
and relational databases, especially in
supporting social media data
5
Background Sources
MongoDB
• MongoDB Online Manual
• Online articles
Relational databases
• MySQL 5.5 reference manual
• Social Media Management Handbook by Robert Wollan
• Online articles
6
Agenda
• Introduction
• Methodology
• Outcomes
• Blog implementation
• MongoDB vs. Relational databases
• Conclusions
7
Project Approach
This project is a combination of analysis and development tasks
Research  MongoDB, social media data and relational
databases
Implement a social blog using MongoDB
Based on the implementation and research:
Analyse the differences between MongoDB and relational
databases
8
Methodology
Incremental methodology was used to implement the social
blog
• Combines waterfall model with iterations
9
Agenda
• Introduction
• Methodology
• Outcomes
• Blog implementation
• MongoDB vs. Relational databases
• Conclusions
10
A social blog with MongoDB
Features implemented:
•Login with facebook to create user’s profile in MongoDB
•Create, edit and delete posts (text, photos or videos)
•Add comments
•Search by tags
•Sort by blogs with more comments
11
Analysis
Based on our experience implementing the social blog, the
most relevant features to manage social media data are:
•Handle irregular data
•Handle large binary objects (videos, photos)
• Operations
• Metadata
•Manage huge volume of data
•Handle geospatial queries
12
Relational data model
• Fixed-schema
• Assume well-defined structure data with a fixed number of
fields (columns) and relationships
• Minimize redundancy and dependency  Normalization
13
Source: http://blog.jruby.org/
Terminology
RDBMS
MongoDB
Table
Collection
Rows
JSON Document
Index
Index
Join
Embedding & Linking
14
Document-oriented data
model
MongoDB uses a document-oriented model using collections
Main characteristics:
• Schema-less
• Collections can be created on-the-fly when first referenced
• Capped collections: Fixed size, older records dropped after limit
reached
• Collections store documents
15
MongoDB Document
Main characteristics:
• Are represented in a format called BSON (Binary JSON)
• Data is de-normalized
• No joins  Embedding & Linking
{
author: ‘Lucero',
created: Date(‘06-06-2012'),
title: 'Yet another blog post',
text: 'Here is the text...',
tags: [ 'example', ‘lucero' ],
comments: [ { author: 'jim', comment: 'I disagree' },
{ author: 'nancy', comment: 'Good post' }]
}
16
Storing irregular data
Example: Different information in user profiles
MongoDB
• Each document can have different information
doc1 = {name: “Joe”, age: ”20”, interest: ”football” }
doc2 = {name : “Michele”}
Relational database
• Tables with all attributes
• NULL value in columns where data was not provided
Results: Special queries to handle NULL values  Expensive
17
Managing large binary data
MongoDB
•
•
•
•
Divide a large file among multiples documents (GridFS)
Include metadata to large files
Search files base on its content
Retrieve only the first N bytes of a video
Relational database
• Use BLOB (Binary large objects)
• Inefficient manipulating rich media
• BLOB cannot be searched or manipulated using standard
database command
18
Geospatial Indexes
Queries to find the nearest N point to a current location
MongoDB
• Embedded Geospatial features
Relational database
• Spatial extensions
• MySQL implements a subset of the SQL with Geometry Types
environment proposed by Open Geospatial Consortium (OGC)
19
Managing huge volume of data
MongoDB
• High performance
• No joins and embedding makes reads and writes fast
• Indexes including indexing of keys from embedded documents and
arrays
• Horizontal scalability
• Automatic sharding (auto-partitioning of data across servers)
Relational database
• Have shown poor performance on certain data-intensive
applications and delivering streaming media  Case study:
Foursquare
• Difficult to scale to multiple servers
20
Agenda
• Introduction
• Methodology
• Outcomes
• Blog implementation
• MongoDB vs. Relational databases
• Conclusions
21
Conclusions
Benefits that MongoDB offers over relational database:
•
•
•
•
Flexible schema
High performance
Manipulation of large object files out of the box
Embedded geospatial features
However,
• MongoDB does not replace relational databases
• MongoDB and relational databases can coexist
22
Thank You!
Q&A
23