CS186: Introduction to Database Systems

Download Report

Transcript CS186: Introduction to Database Systems

CS 405G: Introduction to
Database Systems
24 NoSQL
Reuse some slides of Jennifer Widom
Chen Qian
University of Kentucky
Summary



Tree-based indexes: O(logN) for search and update,
support range queries
Hash-based indexes: best for equality searches O(1),
cannot support range searches.
Static and dynamic
7/16/2015
Chen Qian @ University of Kentucky
2
NoSQL Systems:
Motivation
NoSQL: The Name
 “SQL” = Traditional relational DBMS
 Recognition over past decade or so:
Not every data management/analysis problem is best
solved using a traditional relational DBMS
 “NoSQL” = “No SQL” =
Not using traditional relational DBMS
 “No SQL”  Don’t use SQL language
NoSQL Systems:
Motivation
NoSQL: The Name
 “SQL” = Traditional relational DBMS
 Recognition over past decade or so:
Not every data management/analysis problem is best
solved using a traditional relational DBMS
 “NoSQL” = “No SQL” =
Not using traditional relational DBMS
 “No SQL”  Don’t use SQL language
 “NoSQL” = “Not Only SQL”
NoSQL Systems:
Motivation
Not every data management/analysis
problem is best solved using a traditional DBMS
Database Management System (DBMS)
provides….
efficient, reliable, convenient, and safe
multi-user storage of and access to massive
amounts of persistent data.
…
NoSQL Systems:
Motivation
NoSQL Systems
Alternative to traditional relational DBMS
+
+
+
+
Flexible schema
Quicker/cheaper to set up
Massive scalability
Relaxed consistency  higher performance & availability
– No declarative query language  more programming
– Relaxed consistency  fewer guarantees
NoSQL Systems:
Motivation
Example #1: Web log analysis
Each record: UserID, URL, timestamp, additional-info
Task: Load into database system
Data cleaning
Data extraction
Verification
Schema
Nothing above is needed for noSQL!
NoSQL Systems:
Motivation
Example #1: Web log analysis
Each record: UserID, URL, timestamp, additional-info
Task: Find all records for…
 Given UserID
 Given URL
 Given timestamp
 Certain construct appearing in additional-info
NoSQL Systems:
Motivation
Example #1: Web log analysis
Each record: UserID, URL, timestamp, additional-info
Separate records: UserID, name, age, gender, …
Task: Find average age of user accessing given URL
May not require strict consistency.
NoSQL Systems:
Motivation
Example #2: Social-network graph
Each record: UserID1, UserID2
Separate records: UserID, name, age, gender, …
Task: Find all friends of friends of friends of … friends of given user
Large number of joins?
Not efficient at all!
Specially designed graph database may be better
NoSQL Systems:
Motivation
Example #3: Wikipedia pages
Large collection of documents
Combination of structured and unstructured data
Task: Retrieve introductory paragraph of all pages about U.S.
presidents before 1900
Mix of structured and unstructured data
NoSQL Systems:
Motivation
NoSQL Systems
Alternative to traditional relational DBMS
+
+
+
+
Flexible schema
Quicker/cheaper to set up
Massive scalability
Relaxed consistency  higher performance & availability
– No declarative query language  more programming
– Relaxed consistency  fewer guarantees
NoSQL
Systems
Overview
NoSQL Systems:
Overview
NoSQL Systems
Several incarnations




MapReduce framework: OLAP
Key-value stores: OLTP
Document stores
Graph database systems
NoSQL Systems:
Overview
MapReduce Framework
Originally from Google, open source Hadoop
 No data model, data stored in files
 User provides specific functions
map() reduce()
 System provides data processing “glue”, fault-tolerance,
scalability
NoSQL Systems:
Overview
Map and Reduce Functions
Map: Divide problem into subproblems
Reduce: Do work on subproblems, combine results
NoSQL Systems:
Overview
MapReduce Architecture
NoSQL Systems:
Overview
MapReduce Example: Web log analysis
Each record: UserID, URL, timestamp, additional-info
Task: Count number of accesses for each domain (inside
URL)
NoSQL Systems:
Overview
MapReduce Example (modified #1)
Each record: UserID, URL, timestamp, additional-info
Task: Total “value” of accesses for each domain based on
additional-info
NoSQL Systems:
Overview
MapReduce Framework
 No data model, data stored in files
 User provides specific functions
 System provides data processing “glue”, fault-tolerance,
scalability
NoSQL Systems:
Overview
MapReduce Framework
Schemas and declarative queries are missed
Hive – schemas, SQL-like query language
Pig – more imperative but with relational operators
 Both compile to “workflow” of Hadoop (MapReduce) jobs
NoSQL Systems:
Overview
Key-Value Stores
Extremely simple interface
 Data model: (key, value) pairs
 Operations: Insert(key,value), Fetch(key),
Update(key), Delete(key)
Implementation: efficiency, scalability, fault-tolerance
 Records distributed to nodes based on key
 Replication
 Single-record transactions, “eventual consistency”
NoSQL Systems:
Overview
Key-Value Stores
Extremely simple interface
 Data model: (key, value) pairs
 Operations: Insert(key,value), Fetch(key),
Update(key), Delete(key)
 Some allow (non-uniform) columns within value
 Some allow Fetch on range of keys
Example systems
 Google BigTable, Amazon Dynamo, Cassandra,
Voldemort, HBase, …
NoSQL Systems:
Overview
Document Stores
Like Key-Value Stores except value is document
 Data model: (key, document) pairs
 Document: JSON, XML, other semistructured formats
 Basic operations: Insert(key,document), Fetch(key),
Update(key), Delete(key)
 Also Fetch based on document contents
Example systems
 CouchDB, MongoDB, SimpleDB, …
NoSQL Systems:
Overview
Graph Database Systems
 Data model: nodes and edges
 Nodes may have properties (including ID)
 Edges may have labels or roles
NoSQL Systems:
Overview
Graph Database Systems
 Interfaces and query languages vary
 Single-step versus “path expressions” versus full recursion
 Example systems
Neo4j, FlockDB, Pregel, …
 RDF “triple stores” can map to graph databases
NoSQL Systems:
Overview
NoSQL Systems
 “NoSQL” = “Not Only SQL”
Not every data management/analysis problem
is best solved exclusively using a traditional DBMS
 Current incarnations
– MapReduce framework
– Key-value stores
– Document stores
– Graph database systems