Introduction to NoSQL - csns - California State University, Los Angeles

Download Report

Transcript Introduction to NoSQL - csns - California State University, Los Angeles

CS422 Principles of Database Systems
Introduction to NoSQL
Chengyu Sun
California State University, Los Angeles
The Need for NoSQL
Big Data
Semi-structured Data
Large-scale Parallel Processing
Is CSNS Big?
Database



112 tables
1.6 million records
260 MB (including indexes)
Files


327,985
71.4 GB
Data collected on 11/29/2015
Some Data Are Definitely Big
Google processed 24 PB data per day in
2009
Facebook had 1.5 PB photos and 60
billion images in 2009
As of 7/1/2015, the Internet Archive
Wayback Machine contains 23 PB data
and grows at a rate of 50-60 TB per
week
How Big Is Big?
Currently any data set over a few TB is
considered big data


Data size large enough to span several
storage units
Traditional RDBMS starts to show signs of
stress
Semi-Structured Data
Data that has some structure but does
not conform strictly to a schema


Data may be irregular or incomplete
Structure may change rapidly or
unpredictably
Semi-Structured Data
Example: HTML Pages
Many HTML pages have a structure: header,
footer, menu, title, content …


But many don’t
And those that do implement the structure in all
kinds of different ways
HTML5 introduces many new tags


New data, e.g. <svg>
Same data now under different tags, e.g.
<section> vs. <div>
The “Web Scale”
Google (2012)

3.3 billion search per day
Twitter (2013)

500 million tweets per day
Facebook (4/2015)

936 million daily active users
Scalability
The ability of a system to increase
throughput with addition of resources
to address load increases
Vertical Scaling – faster CPUs, more
memory, bigger hard drives …
Horizontal Scaling – add more nodes to
server clusters
Why Not RDBMS?
Its strengths are also its weakness
Strength
Schema
ACID
SQL
Weakness
Clearly defines data and
relationship; ensures data
quality and integrity.
Not suitable for semistructured data.
Guarantees the correctness
of the operations and the
durability of the data.
Makes it very difficult to scale
One language for all data
and all RDBMS.
Impedes rapid application
development (RAD) due to the
mismatch between SQL and
the application languages.
NoSQL
No SQL, Not Relational, Not Only SQL …
A term that describes a class of data
storage and manipulation technologies
and products that do not follow the
RDBMS principles and focus on large
datasets, performance, scalability, and
agility.
Types of NoSQL Databases
Key-Value Stores
Document Databases
Column Family Stores
Graph Databases
Key-Value Stores
Simple, fast, scalable
Product
Used By
Redis
Twitter, GitHub, Snapchat, Craigslist
Dynamo
EA, New York Times, HTC
Cassandra
Facebook, Twitter, Reddit
Voldemort
Linkedin
Document Databases
A document in a document database
consists of a loosely structured set of
key-value pairs.
Product
Used By
MongoDB
Facebook, Craigslist, Adobe
CouchDB
Apple, BBC
Document Example
{
‘first_name’: ‘John’,
‘last_name’: ‘Doe’,
‘age’: 20,
‘address’: {
‘street’: ‘123 Main’
‘city’: ‘Los Angeles’
“state”: ‘CA’
}
}
Column Family Stores
Data is stored in a column-oriented way
as opposed to the row-oriented format
in RDBMS
Product
Used By
BigTable
Google
HBase
Facebook, Yahoo, Hulu and others
Hypertable
Baidu, Rediff
Column and Column Family
Columns: first_name, last_name,
gender, occupation, zip_code
Column families



name: first_name, last_name
profile: gender, occupation
location: zip_code
Column families typically need to be predefined while new columns can be added at
any time
Units of Data
row-key: 1
first_name: John
last_name: Doe
gender: male
zip_code: 10001
row-key: 2
first_name: Jane
zip_code: 10002
Data Storage
name
row-key: 1
first_name: John
last_name: Doe
row-key: 2
first_name: Jane
profile
row-key: 1
gender: male
location
row-key: 1
zip_code: 10001
row-key: 2
zip_code: 10002
Graph Databases
Stores vertices (i.e. entities) and edges
(i.e. relationships between vertices)
Optimized for graph storage and
processing
Product
Neo4j
Used By
InfoJobs, Addidas
NoSQL Database Example:
MongoDB
http://www.indeed.com/jobtrends
MongoDB Server
DB
DB
DB
Collection
Collection
Collection
Database
MongoDB Shell
> mongo
A command line client that provides an
interactive JavaScript interface to
MongoDB
Basic MongoDB Shell
Commands
help
show dbs
use <db>


Switch to database <db>
<db> won’t be created until some data is
inserted into it
show collections
db.dropDatabase()
Some Collection Methods
db.<collection>.insert()
db.<collection>.update()
db.<collection>.save()
db.<collection>.find()
db.<collection>.remove()
https://docs.mongodb.org/manual/reference/method/js-collection/
Basic CRUD Operations
Create a database test1
Create two documents (i.e. objects or
records) John and Jane
Save the two documents to a collection
users
Query the collection
Using find()
find( query, projection )
Both query and projection are
documents in the form of
{ field1: <boolean>, field2: <boolean> … }
Query examples:


{firstName: {$eq:“John”}}
{age: {$gt:20, $lte:30}}
Programming Language
Support
Drivers for various server-side
programming language –
https://docs.mongodb.org/ecosystem/d
rivers/
REST web service for client-side
JavaScript

Example: mongo.html
Readings
Professional NoSQL by Shashank Tiwari
MongoDB Manual https://docs.mongodb.org/manual/