Transcript ppt

CSE 190: Internet E-Commerce
Lecture 12: SQL, DB Design
SQL: Simple Calculations
• SQL allows not only selection from database tables, but
also basic calculations over all the rows of those tables
• Select count(*) from customers
where total_loans < 5500;
– Returns the count of customers who have loans of less than
5500
• Select sum(total_loans) from customers;
– Returns the sum of all the loans outstanding for all customers
• Select avg(total_loans) from customers;
– Returns the average of total_loans over all customers
• Reference
– http://www.w3schools.com/sql/default.asp
SQL: GROUP BY
Have Query results collated by the value of specified field
Syntax: SELECT column, calc(column) FROM table GROUP BY column
Example:
SELECT company, sum(amount) from sales GROUP by company
Sales Table
Company
Amount
ABC
120
XYZ
50
ABC
100
Yields:
Company
Amount
ABC
220
XYZ
50
SQL: Views
• Views: A named table that results from
applying criteria to a SELECT clause.
• Example:
– CREATE VIEW YOUNG_PEOPLE AS
SELECT * FROM PEOPLE WHERE DOB >
DateValue('1.1.1980');
– May then be used in subsequent selects:
• SELECT * FROM YOUNG_PEOPLE;
DB Design
• DB Design involves two activities
– Logical design
– Physical design
• Logical design: describing the schemas, tables,
and relationships amongst the tables
– Typically done by application developer
• Physical design: Deciding what disks tables are
located on, how to create indexes, which queries
to optimize on
DB Design: Entity Relationships
• Entity: Logical object captured within a table
• Relationships between other entities are of certain category:
– 1:1
• Implies entities should be merged
– 1:M
• Commonly used to represent lists
– M:1
• Entity is part of some list of the other entity
– M:M
• Avoid this kind of relationship
• Creating these Entity-Relationship diagrams known as Database
modeling; basically similar to object-oriented analysis
• Object oriented databases: reduce the effort to map objects to
relational tables
– However, still immature compared to relational databases
DB Design: Normal Forms
• Normalization: step-wise refinement of our
database schema to eliminate redundancy
and reduce the costs of change
• Good DB Design usually reaches 3rd
Normal Form
DB Design: Zero Normal Form
• Zero normal form
– No normal form rules have been applied yet
• Example (Users table)
Name
Company
Company_
Address
Url1
Url2
Jack
ABC
1 Infinite
Loop
Abc.com
Xyz.com
Jill
XYZ
1 Microsoft
Way
Abc.com
Xyz.com
• Problem: List of URLs, capacity is hard coded to be two.
Need to add a new column to support three URLs!
DB Design: First Normal Form
Rules:
1.
2.
3.
Eliminate repeating groups in individual tables
Create a new table for each set of related data
Identify each set of related data with a primary key
UserId
Name
Company
Company
Address
URL
1
Jack
ABC
1 Infinite Loop
Abc.com
1
Jack
ABC
1 Infinite Loop
Xyz.com
2
Jill
XYZ
1 Microsoft
Way
Abc.com
2
Jill
XYZ
1 Microsoft
Way
Xyz.com
Problem: Adding new URL means we have to copy the same user,
company and company address values. This is redundant and could lead
to inconsistent values.
DB Design: Second Normal Form
Rules:
1.
Create separate tables for sets of values that apply to multiple records
2.
Relate these tables with a foreign key
Users:
URLs
UserId
Name
Company
Company
Address
UrlId
relUserId
URL
1
Jack
ABC
1 Infinite
Loop
1
1
Abc.com
2
Jill
XYZ
1
Microsoft
Way
2
1
Xyz.com
3
2
Abc.com
4
2
Xyz.com
Problem: Adding new employee means we have to copy address information,
leading to redundancy
DB Design: Third Normal Form
Rules:
1.
Eliminate fields which do not depend on the primary key
Note: Once you’ve applied 3rd Normal Form, your database schema is in
relatively good health
UserId
Name
relCompId
UrlId
relUserId
URL
1
Jack
1
1
1
Abc.com
2
Jill
2
2
1
Xyz.com
3
2
Abc.com
4
2
Xyz.com
CompanyId
Company
Company
Address
1
ABC
1 Infinite
Loop
2
XYZ
1 Microsoft
Way
Problem: Note how Abc.com and
Xyz.com are copies. If this comes from
open data entry, then nothing to do
about this. But if it’s from a fixed set of
choices, more may be done.
DB Design: 4th, 5th Normal Form
4th Normal Form: Independent entities
may not be stored in the same
table
UrlRelationId
RelatedUrlId
RelatedUrlUserId
1
1
1
2
1
2
3
2
1
4
2
2
5th Normal Form: Original table must be reconstructed from the tables it
has been decomposed into.
DB Design: Inheritance
• What about subclasses of objects, with
subsets (or supersets) of properties
• Two solutions:
– Master table of properties with object id
selecting for property ID
– One new table for each subclass
• Second generally better for relational
integrity