Database Systems: Design, Implementation, and Management

Download Report

Transcript Database Systems: Design, Implementation, and Management

BTM 382 Database Management
Chapter 2: Data models
Chapter 12.12-13: CAP and Hadoop
Chitu Okoli
Associate Professor in Business Technology Management
John Molson School of Business, Concordia University, Montréal
1
Models and data models
What is a model?
• A model is a simplified way to describe or
explain a complex reality
• A model helps people communicate and work
simply yet effectively when talking about and
manipulating complex real-world phenomena
3
Scientific models
Sources:
http://www.redorbit.com/education/reference_library/space_1/universe/2574692/geocentric_model/
http://hendrianusthe.wordpress.com/2012/06/21/heliocentric-vs-geocentric/
4
Conceptual models
Sources:
http://info563.malagaclasses.info/strategy-it-2/
http://fivewhys.wordpress.com/2012/05/22/business-model-innovation/
5
Importance of Data Models
Communication tool
Give an overall view of the database
Organize data for various users
Are an abstraction for the creation of good
database
6
6
The Evolution of Data Models
Obsolete models:
Hierarchical and network models
8
The Relational Model
• Uses key concepts from mathematical relations (tables)
– “Relational” in “relational model” means “tables” (mathematical
relations), not “relationships”
• Table (relations)
– Matrix consisting of row/column intersections
• Relations have well defined methods (queries) for combining
their data members
– Selecting (reading) and joining (combining) data is defined based
on rigorous mathematical principles
• Relational data management system (RDBMS)
– Relations where originally too advanced for 1970s computing
power
– As computing power increased, simplicity of the model prevailed
9
The Entity Relationship Model
• Very detailed specification of relationships
and their properties
• Enhancement of the relational model
– Relations (tables) become entities
• Entity relationship diagram (ERD)
– Uses graphic representations to model database
components
• Many variations for notation exist; we will use
the Crow’s Foot notation
10
11
The Object-Oriented Data Model (OODM)
• Addresses “impedance mismatch” problem of the ER model
– The ER model’s view of data (tables) and programmers’ view of data
(objects in OOP), is completely different
– This mismatch makes database programming painful, especially for very
complex data structures
• OODM Uses object-oriented programming concepts to store data
–
–
–
–
Objects represent nouns (entities or records)
Objects have attributes (properties or fields) with values (data)
Objects have methods (operations or functions)
Classes group similar objects using a hierarchy and inheritance
• In an OODBMS, the data retrieval and storage closely mirrors the data
structures that programmers use, and so programming complex objects
is much easier than with the ER model
• More advanced forms support the Extended Relational Data Model,
Object/Relational DBMS, and XML data structures
12
OODBMS vs. RDBMS
https://youtu.be/kORTgvfHl4g
13
Big Data and NoSQL
Explaining Big Data
https://youtu.be/7D1CQ_LOizA
15
Big Data
• Volume
– Huge amounts of data (terabytes and petabytes),
especially from the Internet
• Velocity
– Organizations need to process the huge amounts
of data rapidly, just as with smaller databases
• Variety
– Wide variety of data, much of it unstructured and
even changing in structure
16
16
Big data’s solutions
and RDBMS’s failure
• Scale up: use more powerful servers
– RDBMS is very computing intensive
– More data requires much faster, more capable,
expensive computers, and even that’s not good
enough for big data
• Scale out: use many cheap distributed
servers
– RDBMS doesn’t work rapidly with distributed
processing
– Consistency is the biggest problem: guaranteeing
consistency (which RDBMS is great at) is slow,
too slow for big data
17
What is NoSQL?
https://www.youtube.com/watch?v=qUV2j3XBRHc
18
NoSQL Databases to the Big Data rescue
• “NoSQL” means:
– Non-relational or non-RDBMS
– Also “Not only SQL”—a few do support SQL
• It is not one model; it is many different models that are not
relational
• High scalability
– Support distributed database architectures
• High availability
– Rapid performance for big data, including unstructured and sparse data
• Fault tolerance
– Continue to work even if some servers in the cluster fail
• Geared toward performance rather than transaction consistency
• Store data in key-value stores
19
19
Disadvantages of NoSQL
• Complex programming is required
– “NoSQL” means you lose the ease-of-use and structural
independence of SQL
– There is often no relationship support in the database—you
have to program relationships in code
• There is no transaction integrity support
– The data you retrieve at any given moment might be
wrong… but it will eventually become OK
– This is the price to pay for rapid performance in a distributed
database
20
20
The CAP theorem for distributed databases
• CAP stands for:
– Consistency: All nodes see the same data
– Availability: A request always gets a response (success or failure)
– Partition tolerance: Even if a node fails, the system can still
function
• A distributed database can guarantee only two of the three
CAP characteristics, never all three at the same time
– However, over time, it might be able to provide all three
• NoSQL databases are distributed, and so the CAP
theorem restricts them to providing BASE, not ACID
21
21
ACID versus BASE
• A relational database guarantees the ACID
properties:
– Atomicity, Consistency, Isolated, Durable
– In short, a set of SQL statements (called a transaction) will
either all work, or all fail—no half way success, and the
result will not corrupt the database
– A price to pay: results might be somewhat slow
• NoSQL database only guarantee BASE properties:
– Basically Available, Soft-state, Eventual consistency
– In short, at any given moment, not everything might be
consistent, but the database will eventually get consistent
– In return, these imperfect results are delivered fast
22
Table 12.8 –
Distributed Database Spectrum
Sacrifices availability to ensure
consistency and isolation
23
23
Historical outline of data models
24
Which data model should you use?
• Hierarchical or network models
– Obsolete—no one uses these any longer
• Entity-relationship model
– Continuation or enhancement of the relational model
– 90% or more of professional database situations
• Object-oriented database
– When you have very complex data structures, you need rapid
performance, and it makes business sense
• Source: Barry & Associates, Inc
– Data structures are so complex that organizing data as tables causes
headaches in programming retrieval and storage
• NoSQL
– Vast amounts of unstructured data where you need rapid
performance
– Speed is more important than data consistency
25
Sources
• Most of the slides are adapted from
Database Systems: Design,
Implementation and Management by Carlos
Coronel and Steven Morris. 11th edition
(2015) published by Cengage Learning. ISBN
13: 978-1-285-19614-5
• Other sources are noted on the slides
themselves
26
26