MongoDB - EnhanceEdu

Download Report

Transcript MongoDB - EnhanceEdu

Big Data and NoSQL
Cloud Computing : Module 4
Objectives
How Big Is Big Data?
What can we do with it?
What is NoSQL?
Big Data
Kilo- Mega- Giga- Tera- Peta- ??
Structured and Unstuctured
Should be mined for benefit of organizations
http://mashable.com/2012/06/22/data-created-every-minute/
http://mashable.com/2012/06/22/data-created-every-minute/
Where does this data come from?
• 1. Machine generated/Sensor data : eg. Logs, call records
• 2. Social data: eg. Facebook, twitter
• 3. Traditional Enterprise data: eg. Web store transactions
3 V’s of Big Data
Volume
Terabytes of Tweets -> Product sentiment analysis
Volume
Annual meter readings ->predict power consumption
Velocity
Scrutinize 5 million trade events created each day
to identify potential fraud
Velocity
Analyze 500 million daily call detail records in
real-time to predict customer churn faster
Variety
Monitor 100’s of live video feeds from
surveillance cameras to target points of
interest
Exploit the 80% data growth in images, video
and documents to improve customer
satisfaction
Big
Data
Variety
Benefits
• Determine root causes of failures, issues and defects in near-real time,
potentially saving billions of dollars annually.
• Optimize routes for many thousands of package delivery vehicles while they
are on the road.
• Analyze millions of SKUs to determine prices that maximize profit and clear
inventory.
• Generate retail coupons at the point of sale based on the customer's current
and past purchases.
• Send tailored recommendations to mobile devices while customers are in
the right area to take advantage of offers.
• Recalculate entire risk portfolios in minutes.
• Quickly identify customers who matter the most.
• Use clickstream analysis and data mining to detect fraudulent behavior.
Big Data Technologies and Platforms
Hadoop and Hadoop Stack
HDFS, MapReduce, Pig & Hive
Hadoop Distributions – MapR, Cloudera, HortonWorks
NoSQL Databases – MongoDB, Neo4J, Cassandra
Deployment Options – Microsoft Azure, Amazon Elastic
MapReduce
Scale Up vs Scale Out
Scale Up
Adding resources to a single node. i.e adding more CPU , more RAM etc. to a single computer.
Lower infrastructure(Ethernet) Higher hardware(servers) costs
costs
Big hardware impact in case of
Less power consumption than failure of a node.
running multiple server.
Less Server to manage
Scale Out
Amore nodes or servers to the system i.e if there is one computer in a system then scaling out
means adding more computers to the system.
Lower hardware(servers) costs Many hypervisor images to
maintain
More flexible
Fault Tolerance
Higher infrastructure(Ethernet)
costs
High power consumption
NoSQL
Why NoSQL?
Simple, Scalable, Cheap
Can handle petabytes of data which when needed will always be available
No Schema Required
• In NoSQL there is no need to define any rigid database schema to insert the data in a NoSQL database. You
can change the format of data at any time, without application disruption. Thus provides application
flexibility.
Replication
• Multiple copies of data can be stored across the cluster, and data centers. Thus in case of any disaster, there
is a great probability that data can be recovered. Thus ensure high-availability of data.
Distributed
• NoSQL database systems support distributed query i.e there would be no effect on query expressive power
when distributed across hundreds or thousands of servers
Integrated Caching
• NoSQL Databases transparently cache data in system memory, thus reducing latency and increase sustained
data throughput. This behaviour is transparent to the application developer and the operations team
NoSQL Database Types
• Document databases pair each key with a complex data structure
known as a document. Documents can contain many different keyvalue pairs, or key-array pairs, or even nested documents.
• Graph stores are used to store information about networks, such as
social connections. Graph stores include Neo4J and HyperGraphDB.
• Key-value stores are the simplest NoSQL databases. Every single item
in the database is stored as an attribute name (or "key"), together
with its value. Examples of key-value stores are Riak and Voldemort.
Some key-value stores, such as Redis, allow each value to have a
type, such as "integer", which adds functionality.
• Wide-column stores such as Cassandra and HBase are optimized for
queries over large datasets, and store columns of data together,
instead of rows.
Why Not RDBMS?
Reading Data
•Accelerate only data reads
•Cold cache thrash – Caches are temporary, so therefore whenever an application seeks some data, it
first tries to find the data in caching tier and when it doesn’t find the data there then it is forced to
read the data from the RDBMS thus delaying both read and write
•Another tier to manage - In RDBMS , caching is developed as a separate infrastructure tier thus
inserting another infrastructure tier into the existing architecture adds more complexity.
Partitioning(Sharding)
•Application needs to be Partition Aware
•When you fill a shard, it is highly disruptive to re-shard.
•Relationships are broken i.e. referential integrity is no more.
•You lose some of the most important benefits of the relational model.
•You have to create and maintain a schema on every server .
Schema
•RDBMS technology requires the strict definition of a “schema” prior to storing any data into the
database. It’s an integral part as it defines the structure of the database. In RDBMS changes like
capturing new information, changing the data formats and content of the application, are extremely
turbulent and therefore are frequently avoided.
MongoDB – Quick Tutorial
What is MongoDB
Document Database
{
id: "00e8da9d",
type: "Film”
pricing: { ... }
details:
{
title: "The Matrix",
director: [ "Andy Wachowski", "Larry Wachowski" ],
writer: [ "Andy Wachowski", "Larry Wachowski" ],
...,
aspect_ratio: "1.66:1" },
}
….
}
Installing MongoDB
Download the binary files for the desired release of MongoDB.
Download the binaries from https://www.mongodb.org/downloads.
Extract the files from the downloaded archive.
tar -zxvf mongodb-linux-x86_64-2.6.1.tgz
Copy the extracted archive to the target directory.
Copy the extracted folder to the location from which MongoDB will run.
mkdir -p mongodb
cp -R -n mongodb-linux-x86_64-2.6.1/ mongodb
Ensure the location of the binaries is in the PATH variable.
The MongoDB binaries are in the bin/ directory of the archive. To ensure that the binaries are
in your PATH, you can modify your PATH.
For example, you can add the following line to your shell’s rc file (e.g. ~/.bashrc):
export PATH=<mongodb-install-directory>:$PATH
Replace <mongodb-install-directory> with the path to the MongoDB binaries.
Running MongoDB
Create the data directory.
The following example command creates the default /data/db directory:
mkdir -p /data/db
Set permissions for the data directory.
Before running mongod for the first time, ensure that the user account running mongod has
read and write permissions for the directory.
Run MongoDB.
To run MongoDB, run the mongod process at the system prompt. If necessary, specify the
path of the mongod or the data directory. See the following examples.
Run without specifying paths
If your system PATH variable includes the location of the mongod binary and if you use the
default data directory (i.e., /data/db), simply enter mongod at the system prompt:
mongod
Stop MongoDB as needed.
To stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Where to Go Further?
http://docs.mongodb.org/manual/tutorial/
https://university.mongodb.com/
Handling Databases
Connect to a mongod
mongo
From the mongo shell, display the list of databases, with the following operation:
show dbs
Switch to a new database named mydb, with the following operation:
use mydb
Confirm that your session has the mydb database as context, by checking the value of the db
object, which returns the name of the current database, as follows:
db
Inserting Data to Collections
j = { name : "mongo" }
k = { x : 3 }
db.testData.insert( j )
db.testData.insert( k )
Dropping a Database
db.dropDatabase()
> { "dropped" : "mydb", "ok" : 1 }
Inserting Data
SQL
INSERT INTO post VALUES(title, description, tags, likes) VALUES (‘MongoDBOverview’,
‘MongoDB is no sql database’, ‘database’, ‘100’)
MongoDB
db.post.insert([
{
title: 'MongoDB Overview',
description: 'MongoDB is no sql database',
tags: 'database',
likes: 100
}
Retrieving Data
SQL SELECT Statements
MongoDB find() Statements
SELECT * FROM users
db.users.find()
SELECT id, user_id, status FROM users
db.users.find( { }, { user_id: 1, status: 1 } )
SELECT user_id, status FROM users
db.users.find( { }, { user_id: 1, status: 1,
_id: 0 } )
SELECT * FROM users WHERE status = "A"
db.users.find( { status: "A" } )
SELECT user_id, status FROM users
WHERE status = "A"
db.users.find( { status: "A" }, { user_id: 1,
status: 1, _id: 0 } )
SELECT * FROM users WHERE status != "A"
db.users.find( { status: { $ne: "A" } } )
SELECT * FROM users WHERE status = "A"
AND age = 50
db.users.find( { status: "A", age: 50 } )
SELECT * FROM users
WHERE status = "A" OR age = 50
db.users.find( { $or: [ { status: "A" } , {
age: 50 } ] } )
SELECT * FROM users WHERE age > 25
db.users.find( { age: { $gt: 25 } } )
SELECT * FROM users WHERE age < 25
db.users.find( { age: { $lt: 25 } } )
Retrieving Data
db.mycol.find({"tags":"mongodb","title": "MongoDB Overview"}).pretty()
{
"_id": ObjectId(7df78ad8902c),
"title": "MongoDB Overview",
"description": "MongoDB is no sql database",
"tags": ["mongodb", "database", "NoSQL"],
"likes": "100"
}
Where to Go Further?
http://docs.mongodb.org/manual/tutorial/
https://university.mongodb.com/
That’s All Folks!