Transcript PowerPoint

Databases
CS 110
Fall 2005
Homework


Read “Introduction to Access” from
text
For Tuesday: turn in index card with
three questions about reading
Two great issues in computing
Programs
• Instructions on how to manipulate given
data
Data
• Representations of information
Representations


Data are a specific slice of reality
Many representations can organize data
Unstructured Text
Steve
Alan
Mary
Damon
Ordered Text
Alan
Damon
Mary
Steve
Graph
I
H
G
F
Tom
Raleigh
NC
Sally
Easton
MD
Mary
C D
E
Tables
Hierarchy
B
A
John
Liz
Laura
Representations

Same information representable in
many ways
• Representation affects ease with which
certain operations can be done

The ease with which programs/algorithms
can execute
Representational decisions are crucial
Databases are a Representation

Databases store relationships
• Names are paired with phone numbers
• Children are matched with parents

Algorithms act on databases to
extract information
• Find all the Republican soccer moms
• Is the sequence ATTATAA in the gene?
Basic Operations on Databases





Find/Query
Display
Add/Insert
Delete
Change/Modify
Example: File system



Stores “files”
Uses hierarchical representation
Operations
• Display
• Find
• Add
• Delete
• Modify
Example: File system



Stores “files”
Uses hierarchical representation
Operations
• Display
• Find
• Add
• Delete
• Modify
Why this representation?
What’s good/bad about this?
Hierarchical File Systems

Metaphor for file cabinet with folders
• Hides complexity but makes it available



List files in this directory
Go up to parent directory
Go down into child directory
• Easy to describe where to find something

Go up two directories… go to directory foo..
then bar…
Hierarchical File Systems

Problems with hierarchical file
systems
• Where’s my file?!?
• File can’t be in two places at once
• Reorganization not cheap
Google Desktop
Other Representations


What do Google Desktop and
Microsoft “Search” do?
Superimpose another representation
on top of hierarchical one
Time to Find File “foo”

You have to look at every file you
have
• If you have n files, it will require n
“lookups”
• Order: O(n)
Time to Find “foo”

Order all the files first
• Sorting takes some time: nearly O(n2)
• Finding file still takes O(n)
• Why?
Time to Find “foo”

Use Indexing
• Create a smaller list that “points” to
special landmarks in data

These are the tabbed pages in dictionaries
• First find appropriate tab
• Find “foo” in subsection of data
• Order of search is function of number of
tabs and size of subsections
Indexed Databases

Modern databases keep multiple
indexes
• “find” is faster
• “add” and “delete” may be more
expensive if reindexing is required
Finding Database Records

Remember the Relational Model?
• Multiple tables with relationships between them

Finding appropriate records requires
operations
• If they have compatible sizes



Find records in one table but not the other
Find records common to both tables
Combine two tables
• Find records in table that meet requirements


Less than 3
Equal the word “simpleton”
Operators on Relational DBs

Join ( )
• Join table R with S subject to:

A
1
4
7
Column B < Column D
B
2
5
8
R
C
3
6
9
D
3
6
E
1
2
S
A
1
1
4
B
2
2
5
C
3
3
6
D
3
6
6
E
1
2
2
R joined with S
Return Tests
Test
Section
Mean
1
67
Median Standard
Deviation
68
16
2
67
68
14
3
65
66
16
Histograms
cs110-1 Midterm Scores
30
24
Number of Students
25
20
20
15
17
11
10
7
5
5
0
cs110-2 Midterm Scores
30
25
Poitns Scored
25
24
20
15
14
13
13
10
5
2
0
cs110-3 Midterm scores
Number of Students
30
27
25
20
21
17
18
15
15
10
5
0
3