Transcript week5

Indexing and Joins
Indexes
• Increase database performance
• must be explicitly defined
• once defined, are transparent to the user
• once created, the system maintains it
• more than one can exist on a given table
Creating an Index
• Syntax
CREATE [UNIQUE] INDEX index_name
ON table_name (column_name)
• Example
create index auind
on authors (au_id)
Composite Index
•Used when columns have a logical
relationship and would be searched as a
unit
•Example
create index au_name_ind
on authors (au_lname, au_fname)
•order not important, but performance is
better when primary search col is first
2 Kinds of Indexes
• Unique Index
• Clustered Index
Unique Index
•
•
•
•
No 2 rows are permitted to have the same value
system checks data upon creation and data addition
rejects duplicates and returns an error
should only be created on a column that requires
uniqueness eg. ssn, acct code
• can be created as a composite or single column
• helps in maintaining data integrity
• boosts search performance
Clustered Index
•System sorts rows on an ongoing basis so that
the physical order is the same as the indexed
order
•only 1 can exist per table
•should only be created for a column that is most
often retrieved in order
•greatly increases performance when searching
for contiguous key values… especially a range
•slows down data updates due to the sorting
involved
Things to Consider
• Indexes greatly increase query response time
• every index requires system resources to store
and maintain
• indexes can actually slow down the
performance of UPDATES, INSERTS, and
DELETES due to index maintenance
So… don’t over index
What Should We Index?
• Any column frequently used in retrieval
• primary key columns
• columns that are often queried in a sorted
order
• columns that are used in joins
• columns that are often searched for ranges
We Should NOT Index…
• Columns rarely used in queries
• columns with 2 or 3 possible values
eg. Male or Female
• small tables
SQL-92 Create Table Constraints
• PRIMARY KEY
– rejects duplicates and nulls
• UNIQUE
– rejects duplicates, allows nulls
• DEFAULT
– inserts the default value when no value is entered
• CHECK
– validates data format
• FOREIGN KEY and REFERENCES
– ties foreign key to the primary key it references
Put it on paper!
Column
Datatype
Null?
Key
title_id
title
char(6)
varchar(80)
not null
not null
primary, unique
unique
type
pub_id
price
advance
char(12)
char(4)
money
money
Default
Reference
2 letter then 2 dig
unclass
null
null
Check
business,
mod_cook,
trad_cook
publishers,
pub_id
Then write your SQL
create table title
(title_id char(6) not null
constraint tididx primary key
constraint tidcheck check
(title_id like ‘[A-Z] [A-Z] [0-9] [0-9]…’),
title varchar(80) not null
constraint titleidx unique,
type char(12)
default ‘unclassified’ null
constraint typechk check
(type in(‘business’, ‘mod_cook’, ‘trad_cook’)),
pub_id char(4) null
reference publishers (pub_id),
price money null,
advance money null)
Changing a Table
• Syntax
– ALTER table table_name
add column_name datatype null|not null
Removing Objects
• Database
– DROP DATABASE db_name
– deletes ALL tables and data within it!!
• Table
– DROP TABLE table_name
– deletes table and its contents
• Index
– DROP INDEX table_name.index_name
– deletes named index on named table
Joins
In order to maintain normalization in the
database design it is necessary to break
up data into separate tables. The data
can then be re-associated through the
use of a join.
Joins
•
•
•
•
•
•
What columns do I need?
What tables have these columns?
Are all the tables related in some way?
If not, are there other tables that can relate them?
How are they all related?
Link them together by setting their common fields
equal in the WHERE clause.
• Restrict the WHERE clause to the record(s) of
interest.
What to join?
• Key columns are the best since these
were created for the purpose of existing
as a reference.
• Should have similar data
• Should be the same datatype
• nulls will not join since their value is not
known.
Syntax
• Usually best to put the join conditions first in
the WHERE clause
• Use of aliases greatly simplifies the
statement.
• Any logical operator can be used.
• A self-join can be performed on the same
table by qualifying it twice.
Self Join
Which authors in Oakland have the same zip code?
Select distinct au1.au_fname , au1.au_lname, au1.zip
from authors au1, authors au2
where au1.city = “Oakland”
and au1.zip = au2.zip
and au1.au_id != au2.au_id
How a Join is Processed
• First the system obtains the Cartesian Product of all
tables in join
Cartesian Product - the matrix of all possible
combinations that could satisfy the join
• The select list is used to restrict the columns
returned
• The WHERE clause is then used to restrict the rows
return that satisfy the query
2 ways of looking at a Join
• Looking at all the tables, linking them
together and treating them like one big
table.
• Setting the main search criteria and then
linking the common fields to the data
that is of interest.