Transcript ppt

CS351 Introduction to
Database systems
[email protected]
1
About me
Quan Wang
–
developer at Oracle Corp.
•
•
•
JDBC
Web Services
Cloud
2
Content
Introduction
–
Why databases?
Data Model
3
Why databases?
It used to be about boring stuff:
employee records, bank records, etc.
Today, the field covers all the largest
sources of data, with many new ideas.
 Web search.
 Data mining.
 Scientific and medical databases.
 Integrating information.
4
Why databases?
You may not notice it, but databases
are behind almost everything you do on
the Web.
 Google searches.
 Queries at Amazon, eBay, etc.
5
Why databases?
Databases often have unique
concurrency-control problems.
 Many activities (transactions) at the
database at all times.
 Must not confuse actions, e.g., two
withdrawals from the same account must
each debit the account.
6
Why databases?
Changing landscape
 System R: Oracle, MySQL, IBM, MS
 NoSQL: MangoDB
 NewSQL: VoltDB
 Cloud: Amazon RDS, Google Bigquery
 Bigdata: Hadoop, MapR
7
Why databases?
Job perspectives
–
–
Startups
Massive industry
8
Content
Introduction
–
–
Why database?
What to cover?
Data Model
9
What to cover?
Database
–
–
a set of inter-related data pertaining
to some organization
airline, university, store
Database Management System (DBMS)
–
–
one or more databases
algorithms accessing the databases
10
11
What to cover?
Database Theory
–
–
relational model and algebra
norm forms
Design of databases
–
–
normalization
E/R, UML
12
What to cover?
Database programming
–
SQL, stored procedures.
Integration?
–
–
ODBC, JDBC
O-R Mapping
13
What not covered?
Not covered
–
–
–
–
DB tuning
DB administration(DBA)
DB implementation
Programming languages
•
May need to learn languages and the
DB related aspects
14
Content
Introduction
–
–
–
Why database?
What to cover?
Logistics
Data Model
15
Logistics


Email: [email protected]
Office hour
–
–
–
Tuesday & Thursday
3 pm to 5pm
201L
16
Logistics

No TA


No detailed comments on assignments
Distribute answer keys when
appropriate


Check discrepancies
Raise issues
17
Logistics
Course page:
–
–
–
Http://encs.vancouver.wsu.edu/~quwa
ng
weekly schedule
assignments
•
•
•
due Thursdays before class
paper submission preferred
email submission ok
18
Logistics




Assignments 30%
Team project 20%
Midterms 25%
Final 25%
19
Logistics

Team project
–
–
–

mini-ebay
Max 3 people each team
Single grade for each team
Assignments and project
–
Late submission 10% penalty
20
Logistics

Databases
–
–
MySQL
SQLite
•
/usr/bin/sqlite3
21
Logistics

Exams
–
–
–
Lectures, textbook, and assigned
readings
Only areas covered in the lectures
Not curved
22
Content
Introduction
Why database?
What to cover?
Logistics



Data Model
–
flat files
23
Flat files

data
savings.dat
NAME,ADDR,BALANCE,OPENED,INTEREST
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

operations
–
–
–
check balance
deposit or withdraw
change address
24
Flat files

data
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

problems
–
code coupled with data format
•
changing delimiter would require code
change
25
Flat files

data
Savings.data:
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
Checkings.data:
David, 789 NE Sandy Blvd, 1500, 08/20/1998, 0.001!
Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0

problems
–
redundancy leads to inconsistency
26
Flat files

data
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

problems
–
concurrency
•
withdraw calls for file lock
–
–
withdraw $100 from the same account at
the same time, synchronized
withdraw $100 from different accounts at
the same time, synchronized unnecessarily
27
Flat files

data
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

problems
–
user interface
•
file based algorithms
28
Flat files

Why not flat files?
–
–
–
–
code coupled with data
redundancy leading to inconsistency
low concurrency
unfriendly user interface
29
Relational data model

database as abstraction



data model hides data format
SQL hides algorithms
runtime handles concurrency
30
Content
Introduction
Data Model
–
–
flat files
relational data model
•
•
•
relations
schema
keys
31
What is a Data Model?
1. Underlying structure of a database.
2. Mathematical representation of data.
 Examples:
•
•
•
•
flat file model = files
relational model = tables;
semistructured model = trees/graphs;
key-value model=key value pairs.
3. Operations on data.
4. Constraints.
32
A Relation is a Table
Attributes
(column
headers)
Tuples
(rows)
name
manf
Winterbrew Pete’s
Bud Lite
Anheuser-Busch
Beers
Relation
name
33
Schemas
Relation schema = relation name and
attribute list.
 Optionally: types of attributes.
 Example: Beers(name, manf) or
Beers(name: string, manf: string)
Database = collection of relations.
Database schema = set of all relation
schemas in the database.
34
Why Relations?
Very simple model.
Often matches how we think about
data.
Abstract model that underlies SQL, the
most important database language
today.
35
Example
Sells(
Joe’s
Joe’s
Sue’s
Sue’s
bar,
Bud
Miller
Bud
Coors
beer, price )
2.50
2.75
2.50
3.00
Bars( name, addr
Joe’s Maple St.
Sue’s River Rd.
37
)
Database Schemas in SQL
SQL is primarily a query language, for
getting information from a database.
But SQL also includes a data-definition
component, DDL, for describing
database schemas.
38
Creating (Declaring) a Relation
Simplest form is:
CREATE TABLE <name> (
<list of elements>
);
To delete a relation:
DROP TABLE <name>;
39
Elements of Table Declarations
Most basic element: an attribute and its
type.
The most common types are:
 INT or INTEGER (synonyms).
 REAL or FLOAT (synonyms).
 CHAR(n ) = fixed-length string of n
characters.
 VARCHAR(n ) = variable-length string of up
to n characters.
40
Example: Create Table
CREATE TABLE Sells (
bar
CHAR(20),
beer
VARCHAR(20),
price REAL
);
41
SQL Values
Integers and reals are represented as
you would expect.
Strings are too, except they require
single quotes.
 Two single quotes = real quote, e.g.,
’Joe’’s Bar’.
Any value can be NULL.
42
Dates and Times
DATE and TIME are types in SQL.
The form of a date value is:
DATE ’yyyy-mm-dd’
 Example: DATE ’2007-09-30’ for Sept.
30, 2007.
43
Times as Values
The form of a time value is:
TIME ’hh:mm:ss’
with an optional decimal point and
fractions of a second following.
 Example: TIME ’15:30:02.5’ = two
and a half seconds after 3:30PM.
44
45
Declaring Keys
An attribute or list of attributes may be
declared PRIMARY KEY or UNIQUE.
Either says that no two tuples of the
relation may agree in all the attribute(s)
on the list.
There are a few distinctions to be
mentioned later.
46
Declaring Single-Attribute Keys
Place PRIMARY KEY or UNIQUE after the
type in the declaration of the attribute.
Example:
CREATE TABLE Beers (
name CHAR(20) UNIQUE,
manf CHAR(20)
);
47
Declaring Multiattribute Keys
A key declaration can also be another
element in the list of elements of a
CREATE TABLE statement.
This form is essential if the key consists
of more than one attribute.
 May be used even for one-attribute keys.
48
Example: Multiattribute Key
The bar and beer together are the key for Sells:
CREATE TABLE Sells (
bar
CHAR(20),
beer VARCHAR(20),
price REAL,
PRIMARY KEY (bar, beer)
);
49
PRIMARY KEY vs. UNIQUE
1. There can be only one PRIMARY KEY
for a relation, but several UNIQUE
attributes.
2. No attribute of a PRIMARY KEY can
ever be NULL in any tuple. But
attributes declared UNIQUE may have
NULL’s, and there may be several
tuples with NULL.
50
Content
Introduction
Data Model
flat files
relational data model
–
–



–
relations
schema
keys
relations vs files
51
relations vs files

data
savings.dat
NAME,ADDR,BALANCE,OPENED,INTEREST
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
52
relations vs files

schema
Savings( name CHAR(50),
addr CHAR(50),
balance REAL,
opened DATE,
interest REAL);
53
relations vs files


relation
name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
operations
–
–
–
check balance
deposit or withdraw
change address
54
relations vs files


relation
name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
advantage
–
code coupled with data format?
55
relations vs files
relation


name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
advantage
–
concurrency
•
simultaneous withdraw calls
56
relations vs files

relations
Savings(name, addr, balance, opened, interest)
Checkings(name, addr, balance, opened, interest)

advantage
–
redundancy?
57
relations vs files
relations

Savings
name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
Checkings

name
addr
balance
opened
interest
David
123123 Sandy Blvd
1500
1998
0.001
Lisa
432 Alberta St
3500
1990
0
advantage
–
redundancy?
58
relations vs files

relations
Customers(cust_id, name, addr)
Savings(cust_id, balance, opened, interest)
Checkings(cust_id, balance, opened, interest)

advantage
–
redundancy?
59
relations vs files
Savings

relations
cust_id
balance
opened
interest
c1
500.75
2010
0.005
c2
4000.00
2001
0.006
Customers
cust_id
name
addr
c1
David
123 NW Flander St
Checkings
c2
Lisa

432 Alberta St
advantage
–
cust_id
balance
opened
interest
c1
1200
1998
0.001
c2
3500
1990
0
redundancy?
60
relations vs files
relation


name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
advantage
–
user interface
•
natural language like support desirable
61
relations vs files


relation
NAME
ADDR
BALANCE OPENED
David
123 NW
500.75
Flander St
2010
0.005
Lisa
432
Alberta St
2001
0.006
4000.00
INTEREST
advantage
–
user interface SELECT name, addr
FROM savings
WHERE balance>2000
62
relations vs files


relation
NAME
ADDR
BALANCE OPENED
David
123 NW
500.75
Flander St
2010
0.005
Lisa
432
Alberta St
2001
0.006
4000.00
INTEREST
advantage
–
SELECT name, addr, balance
FROM savings
user interface ORDER BY balance
63
Content
Introduction
Data Model
–
–
–
flat files
relational data model
relations vs files
64
Content
Introduction
Data Model
–
–
–
–
flat files
relational data model
relations vs files
semi-structured data models
•
XML and JSON
65
XML
<Book>
<Title>Parsing Techniques</Title>
<Authors>
<Author>Dick Grune</Author>
<Author>Ceriel J.H. Jacobs</Author>
</Authors>
<Date>2007</Date>
<Publisher>Springer</Publisher>
</Book>
66
JSON
{
"Book":
{
"Title": "Parsing Techniques",
"Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ],
"Date": "2007",
"Publisher": "Springer"
}
}
67
XML and JSON, side-by-side
{
<Book>
<Title>Parsing Techniques</Title>
<Authors>
<Author>Dick Grune</Author>
<Author>Ceriel J.H. Jacobs</Author>
</Authors>
<Date>2007</Date>
<Publisher>Springer</Publisher>
</Book>
"Book":
{
"Title": "Parsing Techniques",
"Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ],
"Date": "2007",
"Publisher": "Springer"
}
}
68
XML Schema
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string" />
<xs:element name="Authors">
<xs:complexType>
<xs:sequence>
<xs:element name="Author" type="xs:string" maxOccurs="5"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Date" type="xs:gYear" />
<xs:element name="Publisher" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Springer" />
<xs:enumeration value="MIT Press" />
<xs:enumeration value="Harvard Press" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
69
JSON Schema
{
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
"properties": {
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
70
{
String type
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
"properties": {
<xs:element name="Title" type="xs:string" />
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
71
List type
{
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
<xs:element name="Authors">
<xs:complexType>
"properties": {
<xs:sequence>
"Book": {
<xs:element name="Author" type="xs:string" maxOccurs="5"/>
"type": "object",
</xs:sequence>
</xs:complexType>
"properties": {
</xs:element>
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
72
Year type
{
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
"properties": {
<xs:element name="Date" type="xs:gYear" />
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
73
{
Enumeration type
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
"properties": {
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
<xs:element name="Publisher" minOccurs="0">
}
<xs:simpleType>
},
<xs:restriction base="xs:string">
"required": ["Book"],
<xs:enumeration value="Springer" />
"additionalProperties": false
<xs:enumeration value="MIT Press" />
}
<xs:enumeration value="Harvard Press" />
</xs:restriction>
</xs:simpleType>
</xs:element>
74
Semi-Structured Data Models
XML Adoption
–
–
Web Services (SOAP)
XML DB
JSON Adoption
–
–
Web Services (REST)
MongoDB
75
Content
Introduction
Data Model
–
–
–
–
–
flat files
relational data model
relations vs files
semi-structured data models
history
76
History

Hierarchical (IMS): late 1960’s and 1970’s

Network (CODASYL): 1970’s

Relational: 1970’s and early 1980’s

Entity-Relationship: 1970’s

Object-oriented: late 1980’s and early 1990’s

Object-relational: late 1980’s and early 1990’s

Semi-structured (XML): late 1990’s to the present
77