Transcript Lecture 3

INFO 340
Lecture 3
Relational Databases
Relational Databases
• Based on the relational model,
grounded in mathematic set theories.
• Three basic elements: Relation, Tuple,
and Attributes (in SQL called Tables,
Rows, Columns)
Core Relation Model
Attributes
Relation
Name
Departme
nt
Phone
DOB
Sex
John
White
Managem
ent
x1209
1-Oct-45
M
Ann
Beech
Payroll
x4000
10-Nov60
F
David
Ford
Finance
X3000
12-Jul-72 M
Tuple
Other Elements in the
Relational Model
• Domain – Specifies the constraints placed
upon an attribute or attributes.
• Degree – The number of attributes in a
relation.
• Cardinality – Number of tuples in a
relation.
And finally….
• Relational database – normalized
collection of distinctly named relations.
Relational Keys
Main goal is to ensure that each tuple can be
uniquely identified.
Assume there are 10,000 records in this
database. What would be a good way to
uniquely identify each row?
Relational Keys
How about now?
Other ways to uniquely identify -Composite Key
Relational Keys
• Superkey - Any attribute set that uniquely
identifies
• Candidate key – Uniquely identifies
– Irreducibility - can’t make it any smaller & still work
• Primary key - The candidate key that you
choose to use
• Foreign key - An attribute (or set) in a table
that matches the candidate key of another
table. (Can be the same table).
Relational Keys
Mathematical Definition of Relation
• Consider two sets, D1 & D2, where D1 = {2, 4} and D2 = {1, 3, 5}.
• Cartesian product, D1 X D2, is set of all ordered pairs, where first
element is member of D1 and second element is member of D2.
D1 X D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)}
• Alternative way is to find all combinations of elements with first from D1
and second from D2.
Mathematical Definition of Relation
•Any
subset of Cartesian product is a relation; e.g.
• R = {(2, 1), (4, 1)}
•May
specify which pairs are in relation using some condition
•for selection; e.g.
–second element is 1:
R = {(x, y) | x ∈ D1, y ∈ D2, and y = 1}
–first element is always twice the second:
S = {(x, y) | x ∈ D1, y ∈ D2, and x = 2y}
Relational database model means
relations between attributes
A common misconception is that the
“relational” in relational databases refers
to the way one table relates to another
table via foreign keys. This is not the
case! Instead, it refers to the relation
between attributes in a relation (table in
DB speak).
Integrity Constraints
• Domain constraints - every attribute has
an associated domain
• Integrity Rules
• Entity Integrity
• No attribute of primary key can be null
• Referential Integrity
• If a foreign key exists in a relation, it must
‘point’ to something -- ie there must be a
candidate value in the home relation
Examples of Attribute Domains
NULL
• Used to indicate value unknown. This
means it doesn’t work in normal
comparisons.
• Null is not Zero or a blank string.
• Debate rages over whether or not to
include NULL.
– Opens up a relational model.. No longer
closed.
• Was in E.F. Codd’s original specification
Null example
• Consider the following:
Name
HourlyRate
Bob
8.50
John
12
Sue
NULL
Alice
8.50
• What would the result of asking for rows where
HourlyRate != 8.50 return?
• What about HourlyRate = 8.50 or HourlyRate !=
8.50?
Donald Rumsfeld, Feb 2002
news briefing
• “As we know, there are known knowns.
There are things we know we know.
We also know there are known
unknowns. That is to say we know
there are some things we do not know.
But there are also unknown unknowns,
the ones we don’t know we don’t know.”
Examples
Cyber-Crime Attribution Database
Law enforcement work is being done to identify relationships & patterns of
activity with global malware-writers. A database needs to be built that
tracks known malware writers, their locations, the nearest educational
facility/institute, known aliases, known affiliations, etc. It also must
track known malware & its types, ie trojan, worm, rootkit, its size etc. It
must track which computers have been attacked, what types they are,
what their OS’s are, where they were located at the time of attack, the
time of attack itself, the IP addresses and domain names of the boxes.
Were the victims firewalled? What type of firewall – built in, external
hardware, etc. What was the patch level at the time of attack? Were
updates running? How many search engine references are there to this
code name? Break this down by specific search engine.
Class Activity
• Divide into 5 groups (approx 6 - 7 people)
• Give yourself a group name
• Take 15 minutes to come up with scenario requiring a
database. Write it down legibly.
– Name 3 canned queries that the database must
be able to solve. Write these down as well.
• At the end of 15 minutes, the first group will give their
scenario to the next group. The receiving group will:
– On the board, create an ERD with:
• field names & data types
• An identified primary key
Homework 1
Due Tue 1/14/08
• Page 87 -- Questions:
– 3.3, 3.4, 3.5, 3.8
• Page 110 – Questions:
– 4.1, 4.8, 4.9