Transcript PowerPoint
Databases
CS 110
Fall 2005
Homework
Read “Introduction to Access” from
text
For Tuesday: turn in index card with
three questions about reading
Two great issues in computing
Programs
• Instructions on how to manipulate given
data
Data
• Representations of information
Representations
Data are a specific slice of reality
Many representations can organize data
Unstructured Text
Steve
Alan
Mary
Damon
Ordered Text
Alan
Damon
Mary
Steve
Graph
I
H
G
F
Tom
Raleigh
NC
Sally
Easton
MD
Mary
C D
E
Tables
Hierarchy
B
A
John
Liz
Laura
Representations
Same information representable in
many ways
• Representation affects ease with which
certain operations can be done
The ease with which programs/algorithms
can execute
Representational decisions are crucial
Databases are a Representation
Databases store relationships
• Names are paired with phone numbers
• Children are matched with parents
Algorithms act on databases to
extract information
• Find all the Republican soccer moms
• Is the sequence ATTATAA in the gene?
Basic Operations on Databases
Find/Query
Display
Add/Insert
Delete
Change/Modify
Example: File system
Stores “files”
Uses hierarchical representation
Operations
• Display
• Find
• Add
• Delete
• Modify
Example: File system
Stores “files”
Uses hierarchical representation
Operations
• Display
• Find
• Add
• Delete
• Modify
Why this representation?
What’s good/bad about this?
Hierarchical File Systems
Metaphor for file cabinet with folders
• Hides complexity but makes it available
List files in this directory
Go up to parent directory
Go down into child directory
• Easy to describe where to find something
Go up two directories… go to directory foo..
then bar…
Hierarchical File Systems
Problems with hierarchical file
systems
• Where’s my file?!?
• File can’t be in two places at once
• Reorganization not cheap
Google Desktop
Other Representations
What do Google Desktop and
Microsoft “Search” do?
Superimpose another representation
on top of hierarchical one
Time to Find File “foo”
You have to look at every file you
have
• If you have n files, it will require n
“lookups”
• Order: O(n)
Time to Find “foo”
Order all the files first
• Sorting takes some time: nearly O(n2)
• Finding file still takes O(n)
• Why?
Time to Find “foo”
Use Indexing
• Create a smaller list that “points” to
special landmarks in data
These are the tabbed pages in dictionaries
• First find appropriate tab
• Find “foo” in subsection of data
• Order of search is function of number of
tabs and size of subsections
Indexed Databases
Modern databases keep multiple
indexes
• “find” is faster
• “add” and “delete” may be more
expensive if reindexing is required
Finding Database Records
Remember the Relational Model?
• Multiple tables with relationships between them
Finding appropriate records requires
operations
• If they have compatible sizes
Find records in one table but not the other
Find records common to both tables
Combine two tables
• Find records in table that meet requirements
Less than 3
Equal the word “simpleton”
Operators on Relational DBs
Join ( )
• Join table R with S subject to:
A
1
4
7
Column B < Column D
B
2
5
8
R
C
3
6
9
D
3
6
E
1
2
S
A
1
1
4
B
2
2
5
C
3
3
6
D
3
6
6
E
1
2
2
R joined with S
Return Tests
Test
Section
Mean
1
67
Median Standard
Deviation
68
16
2
67
68
14
3
65
66
16
Histograms
cs110-1 Midterm Scores
30
24
Number of Students
25
20
20
15
17
11
10
7
5
5
0
cs110-2 Midterm Scores
30
25
Poitns Scored
25
24
20
15
14
13
13
10
5
2
0
cs110-3 Midterm scores
Number of Students
30
27
25
20
21
17
18
15
15
10
5
0
3