ra-slides - St. Olaf College

Download Report

Transcript ra-slides - St. Olaf College

Relational Algebra
“The beginning…”
What is a query?
Why use a theoretical basis for designing
‘queries’?
What is a relation?
What is relational algebra (RA)?
Start with some unary operators in RA
Then the basic binary operators
A query
What do we mean by a query to a database?
A query
What do we mean by a query to a database?
Given one or more relations, apply operations that
ultimately return a new relation.
Usually we assume the new relation will have different
information than the original input relations and will
combine the original relations in meaningful ways.
We can write a query accurately in English
We can accurately name the resulting relation in
English
We can draw queries as a way of planning them out

SQL is the language used for talking to a
relational database management system

Allows




But the querying part of SQL isn't very good



changing the schema (“data definition”),
changing the data (“data modification”), and
querying
Confusing syntax that is hard to read
Major differences from one DBMS to the next
We teach you relational algebra (RA) for querying


We use a graphical representation of RA
It's easy to convert RA into SQL

A relational database is a set of relations

A relation is a set of rows and a set of columns

A relation has a name

A relation has an identifier which is a set of
columns


We consistently underline columns in the identifier
Rules:

A row has a value for each column in the identifier


(A row may or may not have values for columns that are
not in the identifier)
No two rows have the same values in all the
identifying columns

Is this a relation?
First name Last name Award
Movie
Jeff
Bridges
Crazy Heart
Sandra
Bullock
Kathryn
Bigelow
Best
Actor
Best
Actress
Director
The Blind
Side
The Hurt
Locker

Is this a relation?
Oscar
First
name
Jeff
Last
Award
name
Bridges Best
Actor
Sandra Bullock Best
Actress
Kathryn Bigelow Director
Movie
Crazy Heart
The Blind Side
The Hurt
Locker

Is this a relation?
Oscar
First name Last name Award
Movie
Jeff
Sandra
Bridges
Bullock
Crazy Heart
The Blind Side
Kathryn
Paul
Bigelow
Ottosson
Paul
Ottosson
Best Actor
Best
Actress
Director
Sound
Editing
Sound
Mixing
The Hurt Locker
The Hurt Locker
The Hurt Locker

Is this a relation?
Oscar
First name
Last name
Award
Movie
Jeff
Bridges
Best Actor
Crazy Heart
Sandra
Bullock
Kathryn
Bigelow
Paul
Ottosson
Paul
Ottosson
Best Actress The Blind
Side
Director
The Hurt
Locker
Sound
The Hurt
Mixing
Locker
Sound
Spider Man
Mixing

Are these two relations the same (both named
Oscar)?
First name
Award
Movie
Jeff
Last
name
Bridges
Best Actor
Crazy Heart
Sandra
Bullock
Best Actress The Blind Side
Kathryn
Bigelow
Director
Last name
Award
Bigelow
Director
Bridges
Bullock
The Hurt Locker
Movie
The Hurt
Locker
Best Actor
Crazy Heart
Best Actress The Blind
Side
First name
Kathryn
Jeff
Sandra
More about relations
No two rows of data are duplicated
implied by our rule about identifying columns
No two columns have the same name
No two relations in a given database have the
same name
Relational Algebra
Inputs:
one or two relations
Operators:
work on those relations (or more explicitly
columns or rows)
Output:
one relation
RA Operators

We'll look at unary operators first
These take a relation and return a relation




project
filter
reduce
Group
Project


Project removes and adds nonidentifying columns
 Adds computed columns –
arithmetic, string manipulation,
etc.
result:

as wide as number of columns named,




which must include the identifying columns
exactly as tall as original relation
identifier is same
SQL:

Select columns from relation;

Project example
First name
Jeff
Sandra
Kathryn
Last name
Bridges
Bullock
Bigelow
Award
Best Actor
Best Actress
Director
Movie
Crazy Heart
The Blind Side
The Hurt Locker
oscar
Project
Id: ???
Carry: Movie
Compute:
age = currentYear - BirthYear
?????
BirthYear
1949
1964
1951

Project example
First name
Jeff
Sandra
Kathryn
Last name
Bridges
Bullock
Bigelow
Award
Best Actor
Best Actress
Director
Movie
Crazy Heart
The Blind Side
The Hurt Locker
BirthYear
1949
1964
1951
oscar
Project
Id: First_name, Last_name, Award
Carry: Movie
Compute:
age = currentYear - BirthYear
First_name, Last_name, Award, Movie, Actor_age
of Oscar
Filter


Filter removes rows based on a condition
result:




as wide as the original relation
shorter or exactly as tall as the original
relation
identifier is same
SQL:

Select * from relation where condition;
First name Last name Award
Jeff
Bridges
Best Actor
Sandra
Bullock
Best
Actress
Kathryn
Bigelow
Director

Movie
Crazy Heart
The Blind
Side
The Hurt
Locker
Birth year
1949
1964
1951
Filter example:

Find each oscar award for the movie “The Blind
Side”

What algorithm do you think Filter uses?

What algorithm do you think Filter uses?

Trick question: depends on the DBMS and the
configuration!

We often combine project and filter

An activity you could try on your own

Go to sqlzoo.net


Try tutorial 0
Reverse engineer the example queries into relational
algebra
Reduce


Reduce changes the identifier,
result:




SQL:


narrower or the same width as the original relation
possibly shorter than the original relation
unsafe!
Select distinct new_identifier from original relation
Let’s try it with the tiny database:
Practice Query #3

Reduce example
First name
Last name Award
Movie
Jeff
Bridges
Best Actor
Crazy Heart
Sandra
Bullock
Best Actress
The Blind Side
Kathryn
Bigelow
Director
The Hurt Locker
Paul
Paul
Ottosson
Ottosson
Sound Editing
Sound Mixing
The Hurt Locker
The Hurt Locker
Group


Group changes the identifier, combining duplicate
rows using an aggregate function
result:




narrower, the same width, or wider than the original relation
possibly shorter than the original relation
unsafe!
SQL (2 basic types):
Select distinct new_identifier, aggregate_function
from relation
group by new_identifier
Select aggregate_function
from relation

Group example
First name
Last name
Award
Movie
Birth year
Jeff
Bridges
Best Actor
Crazy Heart
1949
Sandra
Bullock
Best Actress
The Blind Side
1964
Kathryn
Bigelow
Director
The Hurt Locker
1951

Group example 2
First name
Last name
Award
Birth year
Jeff
Bridges
Best Actor
1949
Sean
Sandra
Penn
Bullock
Best Actor
Best Actress
1960
1964
Kate
Winslet
Best Actress
1975
Tiny database
SQL (2 basic types):
Select distinct new_identifier, aggregate_function
from relation
group by new_identifier
Select aggregate_function
from relation
Practice queries 12, 16