Transcript ppt
CS351 Introduction to
Database systems
[email protected]
1
About me
Quan Wang
–
developer at Oracle Corp.
•
•
•
JDBC
Web Services
Cloud
2
Content
Introduction
–
Why databases?
Data Model
3
Why databases?
It used to be about boring stuff:
employee records, bank records, etc.
Today, the field covers all the largest
sources of data, with many new ideas.
Web search.
Data mining.
Scientific and medical databases.
Integrating information.
4
Why databases?
You may not notice it, but databases
are behind almost everything you do on
the Web.
Google searches.
Queries at Amazon, eBay, etc.
5
Why databases?
Databases often have unique
concurrency-control problems.
Many activities (transactions) at the
database at all times.
Must not confuse actions, e.g., two
withdrawals from the same account must
each debit the account.
6
Why databases?
Changing landscape
System R: Oracle, MySQL, IBM, MS
NoSQL: MangoDB
NewSQL: VoltDB
Cloud: Amazon RDS, Google Bigquery
Bigdata: Hadoop, MapR
7
Why databases?
Job perspectives
–
–
Startups
Massive industry
8
Content
Introduction
–
–
Why database?
What to cover?
Data Model
9
What to cover?
Database
–
–
a set of inter-related data pertaining
to some organization
airline, university, store
Database Management System (DBMS)
–
–
one or more databases
algorithms accessing the databases
10
11
What to cover?
Database Theory
–
–
relational model and algebra
norm forms
Design of databases
–
–
normalization
E/R, UML
12
What to cover?
Database programming
–
SQL, stored procedures.
Integration?
–
–
ODBC, JDBC
O-R Mapping
13
What not covered?
Not covered
–
–
–
–
DB tuning
DB administration(DBA)
DB implementation
Programming languages
•
May need to learn languages and the
DB related aspects
14
Content
Introduction
–
–
–
Why database?
What to cover?
Logistics
Data Model
15
Logistics
Email: [email protected]
Office hour
–
–
–
Tuesday & Thursday
3 pm to 5pm
201L
16
Logistics
No TA
No detailed comments on assignments
Distribute answer keys when
appropriate
Check discrepancies
Raise issues
17
Logistics
Course page:
–
–
–
Http://encs.vancouver.wsu.edu/~quwa
ng
weekly schedule
assignments
•
•
•
due Thursdays before class
paper submission preferred
email submission ok
18
Logistics
Assignments 30%
Team project 20%
Midterms 25%
Final 25%
19
Logistics
Team project
–
–
–
mini-ebay
Max 3 people each team
Single grade for each team
Assignments and project
–
Late submission 10% penalty
20
Logistics
Databases
–
–
MySQL
SQLite
•
/usr/bin/sqlite3
21
Logistics
Exams
–
–
–
Lectures, textbook, and assigned
readings
Only areas covered in the lectures
Not curved
22
Content
Introduction
Why database?
What to cover?
Logistics
Data Model
–
flat files
23
Flat files
data
savings.dat
NAME,ADDR,BALANCE,OPENED,INTEREST
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
operations
–
–
–
check balance
deposit or withdraw
change address
24
Flat files
data
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
problems
–
code coupled with data format
•
changing delimiter would require code
change
25
Flat files
data
Savings.data:
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
Checkings.data:
David, 789 NE Sandy Blvd, 1500, 08/20/1998, 0.001!
Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0
problems
–
redundancy leads to inconsistency
26
Flat files
data
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
problems
–
concurrency
•
withdraw calls for file lock
–
–
withdraw $100 from the same account at
the same time, synchronized
withdraw $100 from different accounts at
the same time, synchronized unnecessarily
27
Flat files
data
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
problems
–
user interface
•
file based algorithms
28
Flat files
Why not flat files?
–
–
–
–
code coupled with data
redundancy leading to inconsistency
low concurrency
unfriendly user interface
29
Relational data model
database as abstraction
data model hides data format
SQL hides algorithms
runtime handles concurrency
30
Content
Introduction
Data Model
–
–
flat files
relational data model
•
•
•
relations
schema
keys
31
What is a Data Model?
1. Underlying structure of a database.
2. Mathematical representation of data.
Examples:
•
•
•
•
flat file model = files
relational model = tables;
semistructured model = trees/graphs;
key-value model=key value pairs.
3. Operations on data.
4. Constraints.
32
A Relation is a Table
Attributes
(column
headers)
Tuples
(rows)
name
manf
Winterbrew Pete’s
Bud Lite
Anheuser-Busch
Beers
Relation
name
33
Schemas
Relation schema = relation name and
attribute list.
Optionally: types of attributes.
Example: Beers(name, manf) or
Beers(name: string, manf: string)
Database = collection of relations.
Database schema = set of all relation
schemas in the database.
34
Why Relations?
Very simple model.
Often matches how we think about
data.
Abstract model that underlies SQL, the
most important database language
today.
35
Example
Sells(
Joe’s
Joe’s
Sue’s
Sue’s
bar,
Bud
Miller
Bud
Coors
beer, price )
2.50
2.75
2.50
3.00
Bars( name, addr
Joe’s Maple St.
Sue’s River Rd.
37
)
Database Schemas in SQL
SQL is primarily a query language, for
getting information from a database.
But SQL also includes a data-definition
component, DDL, for describing
database schemas.
38
Creating (Declaring) a Relation
Simplest form is:
CREATE TABLE <name> (
<list of elements>
);
To delete a relation:
DROP TABLE <name>;
39
Elements of Table Declarations
Most basic element: an attribute and its
type.
The most common types are:
INT or INTEGER (synonyms).
REAL or FLOAT (synonyms).
CHAR(n ) = fixed-length string of n
characters.
VARCHAR(n ) = variable-length string of up
to n characters.
40
Example: Create Table
CREATE TABLE Sells (
bar
CHAR(20),
beer
VARCHAR(20),
price REAL
);
41
SQL Values
Integers and reals are represented as
you would expect.
Strings are too, except they require
single quotes.
Two single quotes = real quote, e.g.,
’Joe’’s Bar’.
Any value can be NULL.
42
Dates and Times
DATE and TIME are types in SQL.
The form of a date value is:
DATE ’yyyy-mm-dd’
Example: DATE ’2007-09-30’ for Sept.
30, 2007.
43
Times as Values
The form of a time value is:
TIME ’hh:mm:ss’
with an optional decimal point and
fractions of a second following.
Example: TIME ’15:30:02.5’ = two
and a half seconds after 3:30PM.
44
45
Declaring Keys
An attribute or list of attributes may be
declared PRIMARY KEY or UNIQUE.
Either says that no two tuples of the
relation may agree in all the attribute(s)
on the list.
There are a few distinctions to be
mentioned later.
46
Declaring Single-Attribute Keys
Place PRIMARY KEY or UNIQUE after the
type in the declaration of the attribute.
Example:
CREATE TABLE Beers (
name CHAR(20) UNIQUE,
manf CHAR(20)
);
47
Declaring Multiattribute Keys
A key declaration can also be another
element in the list of elements of a
CREATE TABLE statement.
This form is essential if the key consists
of more than one attribute.
May be used even for one-attribute keys.
48
Example: Multiattribute Key
The bar and beer together are the key for Sells:
CREATE TABLE Sells (
bar
CHAR(20),
beer VARCHAR(20),
price REAL,
PRIMARY KEY (bar, beer)
);
49
PRIMARY KEY vs. UNIQUE
1. There can be only one PRIMARY KEY
for a relation, but several UNIQUE
attributes.
2. No attribute of a PRIMARY KEY can
ever be NULL in any tuple. But
attributes declared UNIQUE may have
NULL’s, and there may be several
tuples with NULL.
50
Content
Introduction
Data Model
flat files
relational data model
–
–
–
relations
schema
keys
relations vs files
51
relations vs files
data
savings.dat
NAME,ADDR,BALANCE,OPENED,INTEREST
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
52
relations vs files
schema
Savings( name CHAR(50),
addr CHAR(50),
balance REAL,
opened DATE,
interest REAL);
53
relations vs files
relation
name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
operations
–
–
–
check balance
deposit or withdraw
change address
54
relations vs files
relation
name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
advantage
–
code coupled with data format?
55
relations vs files
relation
name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
advantage
–
concurrency
•
simultaneous withdraw calls
56
relations vs files
relations
Savings(name, addr, balance, opened, interest)
Checkings(name, addr, balance, opened, interest)
advantage
–
redundancy?
57
relations vs files
relations
Savings
name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
Checkings
name
addr
balance
opened
interest
David
123123 Sandy Blvd
1500
1998
0.001
Lisa
432 Alberta St
3500
1990
0
advantage
–
redundancy?
58
relations vs files
relations
Customers(cust_id, name, addr)
Savings(cust_id, balance, opened, interest)
Checkings(cust_id, balance, opened, interest)
advantage
–
redundancy?
59
relations vs files
Savings
relations
cust_id
balance
opened
interest
c1
500.75
2010
0.005
c2
4000.00
2001
0.006
Customers
cust_id
name
addr
c1
David
123 NW Flander St
Checkings
c2
Lisa
432 Alberta St
advantage
–
cust_id
balance
opened
interest
c1
1200
1998
0.001
c2
3500
1990
0
redundancy?
60
relations vs files
relation
name
addr
balance
opened
interest
David
123 NW Flander St
500.75
2010
0.005
Lisa
432 Alberta St
4000.00
2001
0.006
advantage
–
user interface
•
natural language like support desirable
61
relations vs files
relation
NAME
ADDR
BALANCE OPENED
David
123 NW
500.75
Flander St
2010
0.005
Lisa
432
Alberta St
2001
0.006
4000.00
INTEREST
advantage
–
user interface SELECT name, addr
FROM savings
WHERE balance>2000
62
relations vs files
relation
NAME
ADDR
BALANCE OPENED
David
123 NW
500.75
Flander St
2010
0.005
Lisa
432
Alberta St
2001
0.006
4000.00
INTEREST
advantage
–
SELECT name, addr, balance
FROM savings
user interface ORDER BY balance
63
Content
Introduction
Data Model
–
–
–
flat files
relational data model
relations vs files
64
Content
Introduction
Data Model
–
–
–
–
flat files
relational data model
relations vs files
semi-structured data models
•
XML and JSON
65
XML
<Book>
<Title>Parsing Techniques</Title>
<Authors>
<Author>Dick Grune</Author>
<Author>Ceriel J.H. Jacobs</Author>
</Authors>
<Date>2007</Date>
<Publisher>Springer</Publisher>
</Book>
66
JSON
{
"Book":
{
"Title": "Parsing Techniques",
"Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ],
"Date": "2007",
"Publisher": "Springer"
}
}
67
XML and JSON, side-by-side
{
<Book>
<Title>Parsing Techniques</Title>
<Authors>
<Author>Dick Grune</Author>
<Author>Ceriel J.H. Jacobs</Author>
</Authors>
<Date>2007</Date>
<Publisher>Springer</Publisher>
</Book>
"Book":
{
"Title": "Parsing Techniques",
"Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ],
"Date": "2007",
"Publisher": "Springer"
}
}
68
XML Schema
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string" />
<xs:element name="Authors">
<xs:complexType>
<xs:sequence>
<xs:element name="Author" type="xs:string" maxOccurs="5"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Date" type="xs:gYear" />
<xs:element name="Publisher" minOccurs="0">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Springer" />
<xs:enumeration value="MIT Press" />
<xs:enumeration value="Harvard Press" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
69
JSON Schema
{
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
"properties": {
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
70
{
String type
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
"properties": {
<xs:element name="Title" type="xs:string" />
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
71
List type
{
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
<xs:element name="Authors">
<xs:complexType>
"properties": {
<xs:sequence>
"Book": {
<xs:element name="Author" type="xs:string" maxOccurs="5"/>
"type": "object",
</xs:sequence>
</xs:complexType>
"properties": {
</xs:element>
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
72
Year type
{
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
"properties": {
<xs:element name="Date" type="xs:gYear" />
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
73
{
Enumeration type
"$schema": "http://json-schema.org/draft-04/schema
"type": "object",
"properties": {
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
<xs:element name="Publisher" minOccurs="0">
}
<xs:simpleType>
},
<xs:restriction base="xs:string">
"required": ["Book"],
<xs:enumeration value="Springer" />
"additionalProperties": false
<xs:enumeration value="MIT Press" />
}
<xs:enumeration value="Harvard Press" />
</xs:restriction>
</xs:simpleType>
</xs:element>
74
Semi-Structured Data Models
XML Adoption
–
–
Web Services (SOAP)
XML DB
JSON Adoption
–
–
Web Services (REST)
MongoDB
75
Content
Introduction
Data Model
–
–
–
–
–
flat files
relational data model
relations vs files
semi-structured data models
history
76
History
Hierarchical (IMS): late 1960’s and 1970’s
Network (CODASYL): 1970’s
Relational: 1970’s and early 1980’s
Entity-Relationship: 1970’s
Object-oriented: late 1980’s and early 1990’s
Object-relational: late 1980’s and early 1990’s
Semi-structured (XML): late 1990’s to the present
77