product - hutchison

Download Report

Transcript product - hutchison

Using Databases in ICS4U
Grant Hutchison
TDSB
Agenda
•
•
•
•
Motivation
Why teach Database Technologies?
Teaching DBMS using Apache Derby?
Alternative Data Technologies
– Big Data / NoSQL / R
Motivation
There is a Data Explosion Occurring…
and is NOT about to slowdown!!
Volume
Velocity
Variety
Where is the data coming from?
and many more…
Curriculum Integration – ICS4U
A3.1 demonstrate the ability to read from, and
write to, an external file (e.g., text file, binary
file, database, XML file) from within a
computer program;
Pre-DB Units in ICS4U
• Object-Oriented Design
– UML, CRC – Software Engineering Principles
– Implementation
• Java, C++, Python, C# - typical languages
• Algorithms
– design : sort, search
– analysis : predict and test runtime expectations
• Data
– Files : reading, writing, parsing (basics only)
– Structures : Lists, Stacks, Queues, Trees (basics only)
– Scalable data serving components : RDBMS, NoSQL, etc…
Data (Preservation and Access)
File based solutions
Objects can be stored in flat files where each
line of data represents an instance of an
object.
Problem : data (a simple order)
Bob, Smith, 2 Main Street, Toronto, G,
"Jan 15, 2012",C, Hammer, 2, 9.25, Lumber
(4ft), 3, 10.85, .13, 57.69
When to use a Database in
student projects?
Applications that need to :
store and retrieve data quickly and reliability
while users interact with the provided user
interface (UI).
Problems with simply using Files
for Objects
• no query based access
• highly structured
Object Design
Would you design a Class with these attributes?
•
•
•
•
•
•
•
•
•
cust_name
cust_street
cust_city
order_num
item_num
description
cost
quantity
total
Customer
Composition
Item
Order
Relational Database
1980
New and Hot
Quick adoption
1990
Defacto
Standard DBMS
Object
Relational
2000
Relational /
XML Hybrid
DB
2010
++
Object Oriented
DBMS
Parallel
RDBMS
Scalable
Transactions
(OLTP)
Data warehouse
In Memory DBs
Open Source
DBs
NoSQL
Analytical DBs
Big Data
• Volume
• Velocity
• Variety
Introduction to SQL
Declarative programming language designed
for querying and managing data stored in
RDBMS (Relational DBMS).
Declarative ?
Describe the what and not the how !!
SQL Query
SELECT <attributes>
FROM <one or more relations>
WHERE <conditions>
• Not case-sensitive
• Must know the schema of the database
• Tools usually used to manage
• Entity-Relationship Diagrams (ER Diagrams)
SQL
Data Definition Language (DDL)
– Create/alter/delete tables and their attributes
Data Manipulation Language (DML)
– Access one or more tables
– Insert/delete/modify tuples/records in tables
CRUD – SQL operations
Operation
C
R
U
D
Create
Read
Update
Delete
SQL
statement
INSERT
SELECT
UPDATE
DELETE
SCHEMA - Table Structure
Table
Name
Attribute /
Column
PRODUCT
DESCRIPTION
PRICE
CATEGORY
MANUFACTURER
Data type
CREATE TABLE PRODUCT (
DESCRIPTION VARCHAR(40), PRICE DECIMAL(9,2),
CATEGORY VARCHAR (40), MANUFACTURER VARCHAR (40))
Table Structure
PRODUCT
DESCRIPTION
PRICE
CATEGORY
MANUFACTURER
Hammer
$19.99
Hand Tools
Stanley
Lumber (4 ft)
$29.99
Raw Materials
ForestWorld
INSERT INTO PRODUCT VALUES (
"Hammer", 19.99, "Hand Tools", "Stanley")
INSERT INTO PRODUCT VALUES (
"Lumber (4 ft) ", 29.99, "Raw Materials",
"ForestWorlds")
Tables Explained
The schema of a table is the table name and its
attributes.
A key is an attribute whose values are used to
ensure uniqueness of the tuple or assist in
defining relationships with out tables.
Data Types in SQL
Strings
– CHAR, VARCHAR, LONG VARCHAR, CLOB
Numeric
– SMALLINT, INT, BIGINT, DECIMAL, REAL,
FLOAT,DOUBLE
Dates and Times
– DATE, TIME, TIMESTAMP
Binary
– CHAR FOR BIT DATA, VARCHAR FOR BIT
DATA, LONG VARCHAR FOR BIT DATA, BLOB
SQL Query
SELECT <attributes>
FROM <one or more relations>
WHERE <conditions>
• Not case-sensitive
• Must know the schema of the database
• Tools usually used to manage
• Entity-Relationship Diagrams (ER Diagrams)
Simple SQL Query
DESCRIPTION
PRICE
CATEGORY
MANUFACTURER
Hammer
$19.99
Hand Tools
Stanley
Lumber (4 ft)
$29.99
Raw Materials
ForestWorld
SELECT *
FROM
PRODUCT
WHERE CATEGORY="Hand Tools"
DESCRIPTION
PRICE
CATEGORY
MANUFACTURER
Hammer
$19.99
Hand Tools
Stanley
Simple SQL Query
DESCRIPTION
PRICE
CATEGORY
MANUFACTURER
Hammer
$19.99
Hand Tools
Stanley
Lumber (4 ft)
$29.99
Raw Materials
ForestWorld
SELECT DESCRIPTION
FROM
PRODUCT
WHERE PRICE > 20
DESCRIPTION
PRICE
CATEGORY
MANUFACTURER
Lumber (4 ft)
$29.99
Raw Materials
ForestWorld
The LIKE operator
SELECT *
FROM
PRODUCTS
WHERE DESCRIPTION LIKE ‘%Lumber%’
% = any sequence of characters
_ = any single character
Ordering the Results
SELECT DESCRIPTION, PRICE
FROM PRODUCT
WHERE CATEGORY=‘Hand Tools’ AND PRICE > 20
ORDER BY PRICE
Ordering is ascending, unless you append the DESC keyword.
Normalization
Process of designing tables to avoid data
duplication and maintain consistency.
1NF – Remove multiple items for each tuple/row
2NF – Remove duplicate data for each tuple/row
3NF – Ensure attributes are dependant on the key
and nothing but the key
4NF – Not usually needed
Object Design
Would you design a Class with these attributes?
•
•
•
•
•
•
•
•
•
cust_name
cust_street
cust_city
order_num
item_num
description
cost
quantity
total
Customer
Aggregate
Item
Order
Joins
Bob, Smith, 2 Main, Toronto, G,
"Jan 15, 2012",C, .13, 57.69,
"Hammer", 2, 9.25,
"Lumber (4ft)", 3, 10.85
CUST (cid, fname, lname, street, city, type)
ORDER (oid, cid, o_date, status, tax_r, total )
LINEITEM (oid, iid, prod_id, quantity)
PRODUCT (pid, description, price)
Complex Joins – Answer Questions
CUST (cid, name, street, city, type)
ORDER (oid, cid, o_date, status, tax_r, total )
LINEITEM (oid, iid, prod_id, quantity)
PRODUCT (pid, description, price)
Find the Average Order for preferred customers in 2012.
SELECT
FROM
WHERE
avg(o.total)
CUST c, ORDER o
c.type = "P" AND
c.cid = o.cid AND
YEAR(o.o_date) = "2012"
SQL Aggregation Functions
SELECT avg(price)
FROM
product
WHERE maker=“Toyota”
SELECT count(*)
FROM product
WHERE year > 1995
There are many more aggregation functions.
Purchase
Simple Aggregations
Product
Date
Price
Quantity
Bagel
10/21
1
20
Banana
10/3
0.5
10
Banana
10/10
1
10
Bagel
10/25
1.50
20
SELECT Sum(price * quantity)
FROM
Purchase
WHERE product = ‘bagel’
$50 ($20+$30)
Deletions
DELETE FROM CUST
WHERE name = "Hutchison"
Careful as DELETE could remove:
• NO ROWS
• ONE ROW
• ALL ROWS
Updates
UPDATE
SET
WHERE
PRODUCT
price = price * .5
category = "Hand Tools"
Apache Derby
Apache derby is an open source, small
footprint, SQL standards-based Java relational
database engine.
How Small?
• only 2 MB (derby.jar)
History of
Cloudscape / Apache Derby
1997
Cloudscape
Inc.
1999
Acquired by
Informix
2001
Acquired by
IBM
2004
Open Source
Apache
Derby
Derby Modes
Embedded or Networked
Networked Mode
The Derby server can be started in
its own JVM and then multiple
Java applications can use the same
database.
Connecting Java Apps to Derby
• Add derby.jar to your CLASSPATH
• Use Standard JDBC APIs to access data
• JDBC documentation is in the JDK
Alternative Data Technologies
Big Data Systems
• clustered file system
(Hadoop)
• analysis usually using
MapReduce programs
(not SQL)
• Batch oriented
• Analytics not transactions
NoSQL
• document style of data
query and storage (JSON
usually)
• scalable across servers
• complex queries can be
challenging
• Used primarily for web
apps