Part - Courses - University of California, Berkeley

Download Report

Transcript Part - Courses - University of California, Berkeley

Database Design: Normalization
and SQL
University of California, Berkeley
School of Information Management and
Systems
SIMS 202: Information Organization and
Retrieval
12/5/2000
Information Organization and Retrieval
Review
• Database design Process
• Entity-Relationship Diagrams
• Designing a database
12/5/2000
Information Organization and Retrieval
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Application 4
Conceptual
requirements
12/5/2000
Information Organization and Retrieval
Internal
Model
ER Diagrams: Entity
• An Entity is an object in the real world (or
even imaginary worlds) about which we
want or need to maintain information
– Persons (e.g.: customers in a business,
employees, authors)
– Things (e.g.: purchase orders, meetings, parts,
companies)
Employee
12/5/2000
Information Organization and Retrieval
ER Diagrams: Attributes
• Attributes are the significant properties or
characteristics of an entity that help identify
it and provide the information needed to
interact with it or use it. (This is the
Metadata for the entities.)
Birthdate
First
Middle
Age
Name
Employee
Last
12/5/2000
SSN
Projects
Information Organization and Retrieval
ER Diagrams: Relationships
Student
Attends
Class
Project
Supplier
12/5/2000
Supplies
project
parts
Information Organization and Retrieval
Part
ACME Widget Co. Entities
•
•
•
•
•
•
•
•
Customer
Invoice
Employee
Inventory
Supplier
Account
Sales Rep
Parts
12/5/2000
• Timecard
• Check
Information Organization and Retrieval
ACME Widget Co. Functional
areas
•
•
•
•
•
•
•
Ordering
Inventory
Supplies
Shipping
Personnel
Payroll
We will concentrate on Ordering and
Inventory
12/5/2000
Information Organization and Retrieval
ACME Widget
Ordering Normalization
Rep#
Sales-Rep
Part#
Cust#
Customer
Writes
Orders
Invoice
Invoice#
Rep#
Cust#
12/5/2000
Information Organization and Retrieval
Invoice#
Contains
Quantity
Line-Item
Emp#
Wage
Employee
ISA
Hourly
ACME Widget
ER Model
Company#
Sales
Sales-Rep
Part#
Part# Quantity
Company#
Cust#
Customer
Writes
Orders
Invoice
Invoice# Rep#
Invoice# Quantity
Contains
Line-Item
Ordered
Part
Has
Contains
Cust#
On-Order
Supplied
Part
Part
Part#
Count
Price
12/5/2000
Supplier
Information Organization and Retrieval
Supplies
Part#
Cost
Company#
Mapping to a Relational Model
• Each entity in the ER Diagram becomes a
relation.
• A properly normalized ER diagram will
indicate where intersection relations for
many-to-many mappings are needed.
• Relationships are indicated by common
columns (or domains) in tables that are
related.
• We will examine the tables for the Acme
Widget Company derived from the ER
diagram
12/5/2000
Information Organization and Retrieval
Employee
SSN
123-76-3423
342-88-7865
486-87-6543
843-36-7659
…
12/5/2000
Lastname
Jones
Smith
Hendersen
Martinez
…
Firstname
Janet
Thomas
Charles
Roberto
…
Middlename Birthdate Address
City
Mary
6/25/63 234 State Berkeley
Frederick
8/4/70 12 Lambert Oakland
Robert
9/23/61 44 Central Berkeley
Garcia
7/8/58 76 Highland Berkeley
…
…
…
…
Information Organization and Retrieval
Sales-Rep
SSN
Rep # Sales
123-76-3423
1 $12,345.45
843-36-7659
2 $231,456.75
Hourly
SSN
Wage
342-88-7865
$12.75
486-87-6543
$20.50
12/5/2000
Information Organization and Retrieval
Customer
Cust #
COMPANY
STREET1
Integrated Standards
1 Ltd.
35 Broadway
STREET2
STATE
ZIPCODE
New York
NY
02111
34 Bureaucracy Plaza Floors 1-172
3 Control Elevation
Cyber Assicates
Place
Center
Phildelphia
PA
03756
Cyberoid
NY
08645
35 Libra Plaza
Nashua
NH
09242
1 Broadway
Middletown
IN
32467
88 Oligopoly Place
3 Independence
Parkway
Sagrado
TX
78798
Rivendell
CA
93456
8 Little Mighty Micro
34 Last One Drive
Orinda
CA
94563
9 SportLine Ltd.
38 Champion Place
Compton
CA
95328
2 MegaInt Inc.
3 Cyber Associates
General
4 Consolidated
Consolidated
5 MultiCorp
Internet Behometh
6 Ltd.
Consolidated
7 Brands, Inc.
12/5/2000
Floor 12
CITY
Suite 882
Information Organization and Retrieval
Invoice
Invoice # Cust #
93774
84747
88367
88647
776879
65689
12/5/2000
Rep #
3
4
5
9
2
6
Information Organization and Retrieval
1
1
2
1
2
2
Line-Item
Invoice # Part #
Quantity
93774
3
10
84747
23
1
88367
75
2
88647
4
3
776879
22
5
65689
76
12
93774
23
10
88367
34
2
12/5/2000
Information Organization and Retrieval
Part
Part #
1
2
3
4
5
6
7
8
9
12/5/2000
Name
Price
Count
Big blue widget
3.76
2
Small blue Widget
7.35
4
Tiny red widget
5.25
7
large red widget
157.23
23
double widget rack
10.44
12
Small green Widget
30.45
58
Big yellow widget
7.96
1
Tiny orange widget
81.75
42
Big purple widget
55.99
9
Information Organization and Retrieval
Joins
Part #
Invoice # Part #
Quantity
93774
3
10
84747
23
1
88367
75
2
88647
4
3
776879
22
5
65689
76
12
93774
23
10
88367
34
2
1
2
3
4
5
6
7
8
9
Cust #
COMPANY
STREET1
Integrated Standards
1 Ltd.
35 Broadway
12/5/2000
Rep #
3
4
5
9
2
6
1
1
2
1
2
2
STREET2
STATE
ZIPCODE
NY
02111
34 Bureaucracy Plaza Floors 1-172
3 Control Elevation
Cyber Assicates
Place
Center
Phildelphia
PA
03756
Cyberoid
NY
08645
35 Libra Plaza
Nashua
NH
09242
1 Broadway
Middletown
IN
32467
88 Oligopoly Place
3 Independence
Parkway
Sagrado
TX
78798
Rivendell
CA
93456
8 Little Mighty Micro
34 Last One Drive
Orinda
CA
94563
9 SportLine Ltd.
38 Champion Place
Compton
CA
95328
3 Cyber Associates
General
4 Consolidated
Consolidated
5 MultiCorp
Internet Behometh
6 Ltd.
Consolidated
7 Brands, Inc.
Information Organization and Retrieval
Floor 12
CITY
New York
2 MegaInt Inc.
Invoice # Cust #
93774
84747
88367
88647
776879
65689
Name
Price
Count
Big blue widget
3.76
2
Small blue Widget
7.35
4
Tiny red widget
5.25
7
large red widget
157.23
23
double widget rack
10.44
12
Small green Widget
30.45
58
Big yellow widget
7.96
1
Tiny orange widget
81.75
42
Big purple widget
55.99
9
Suite 882
Today
•
•
•
•
Normalization
Relational Algebra and Calculus
SQL
Effectiveness and Efficiency criteria for
database designs
• Advantages and failings of DBMS
technology
12/5/2000
Information Organization and Retrieval
Normalization
• Normalization theory is based on the
observation that relations with certain
properties are more effective in inserting,
updating and deleting data than other sets of
relations containing the same data
• Normalization is a multi-step process
beginning with an “unnormalized” relation
– Hospital example from Atre, S. Data Base: Structured Techniques for
Design, Performance, and Management.
12/5/2000
Information Organization and Retrieval
Normal Forms
•
•
•
•
•
•
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
12/5/2000
Information Organization and Retrieval
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
12/5/2000
BoyceCodd and
Higher
Information Organization and Retrieval
Functional
dependencyof
nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependencyof
nonkey
attributes on
the primary
key
Unnormalized Relations
• First step in normalization is to convert the
data into a two-dimensional table
• In unnormalized relations data can repeat
within a column
12/5/2000
Information Organization and Retrieval
Unnormalized Relation
Patient #
Surgeon #
145
1111 311
Surg. date
Patient Name
Jan 1,
1995; June
12, 1995
John White
Patient Addr Surgeon
15 New St.
New York,
NY
243
1234 467
2345 189
Jan 8,
1996
Charles Brown
4876 145
Nov 5,
1995
Hal Kane
5123 145
May 10,
1995
Paul Kosher
Charles
Field
10 Main St. Patricia
Rye, NY
Gold
Dogwood
Lane
Harrison,
David
NY
Rosen
55 Boston
Post Road,
Chester,
CN
Beth Little
Blind Brook
Mamaronec
k, NY
Beth Little
6845 243
Apr 5,
1994 Dec
15, 1984
Ann Hood
Hilton Road
Larchmont, Charles
NY
Field
12/5/2000
Postop drug
Drug side effects
Gallstone
s removal;
Beth Little Kidney
Michael
stones
Penicillin,
Diamond removal
none-
Apr 5,
1994 May
10, 1995
Mary Jones
Surgery
Information Organization and Retrieval
rash
none
Eye
Cataract
removal
Thrombos Tetracyclin Fever
is removal e none
none
Open
Heart
Surgery
Cholecyst
ectomy
Gallstone
s
Removal
Eye
Cornea
Replacem
ent Eye
cataract
removal
Cephalosp
orin
none
Demicillin
none
none
none
Tetracyclin
e
Fever
First Normal Form
• To move to First Normal Form a relation
must contain only atomic values at each row
and column.
– No repeating groups
– A column or set of columns is called a
Candidate Key when its values can uniquely
identify the row in the relation.
12/5/2000
Information Organization and Retrieval
First Normal Form
Patient #
Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name
1111
145
01-Jan-95 John White
1111
311
12-Jun-95 John White
15 New St.
New York,
NY
15 New St.
New York,
NY
1234
243
05-Apr-94 Mary Jones
10 Main St.
Rye, NY
1234
467
10-May-95 Mary Jones
2345
4876
5123
6845
6845
12/5/2000
189
145
145
243
243
Charles
08-Jan-96 Brown
10 Main St.
Rye, NY
Dogwood
Lane
Harrison,
NY
05-Nov-95 Hal Kane
55 Boston
Post Road,
Chester,
CN
05-Apr-94 Ann Hood
15-Dec-84 Ann Hood
Hilton Road
Larchmont,
NY
Drug adminSide Effects
Charles Field
Gallstone
s removal
Kidney
stones
removal
Eye
Cataract
removal
Patricia Gold
Thrombos
is removal none
none
David Rosen
Open
Heart
Surgery
none
Beth Little
Cholecyst
ectomy
Demicillin
Beth Little
Michael
Diamond
Blind Brook
Mamaronec
10-May-95 Paul Kosher k, NY
Beth Little
Hilton Road
Larchmont,
NY
Surgery
Penicillin
rash
none
none
Tetracyclin
e
Fever
Cephalosp
orin
Charles Field
Gallstone
s
Removal
none
Eye
Cornea
Replacem Tetracyclin
ent
e
Charles Field
Eye
cataract
removal
Information Organization and Retrieval
none
none
none
Fever
none
1NF Storage Anomalies
• Insertion: A new patient has not yet undergone
surgery -- hence no surgeon # -- Since surgeon # is
part of the key we can’t insert.
• Insertion: If a surgeon is newly hired and hasn’t
operated yet -- there will be no way to include that
person in the database.
• Update: If a patient comes in for a new procedure,
and has moved, we need to change multiple
address entries.
• Deletion (type 1): Deleting a patient record may
also delete all info about a surgeon.
• Deletion (type 2): When there are functional
dependencies (like side effects and drug) changing
one item eliminates other information.
12/5/2000
Information Organization and Retrieval
Second Normal Form
• A relation is said to be in Second Normal
Form when every nonkey attribute is fully
functionally dependent on the primary key.
– That is, every nonkey attribute needs the full
primary key for unique identification
12/5/2000
Information Organization and Retrieval
Second Normal Form
Patient #
1111
1234
2345
4876
5123
6845
12/5/2000
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
Information Organization and Retrieval
Second Normal Form
Surgeon #
Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
12/5/2000
Information Organization and Retrieval
Second Normal Form
Patient # Surgeon # Surgery Date
1111
1111
1234
1234
2345
4876
12/5/2000
Surgery
Drug Admin Side Effects
145
Gallstones
01-Jan-95 removal
Kidney
Penicillin
rash
311
stones
12-Jun-95 removal
none
none
243
Eye Cataract
05-Apr-94 removal
Tetracycline Fever
467
Thrombosis
10-May-95 removal
189
Open Heart
08-Jan-96 Surgery
Cephalospori
n
none
145
Cholecystect
05-Nov-95 omy
Demicillin
none
none
none
none
none
5123
145
6845
243
6845
243
Gallstones
10-May-95 Removal
Eye cataract
15-Dec-84 removal
Eye Cornea
05-Apr-94 Replacement
Information Organization and Retrieval
none
none
Tetracycline Fever
1NF Storage Anomalies
Removed
• Insertion: Can now enter new patients without
surgery.
• Insertion: Can now enter Surgeons who haven’t
operated.
• Deletion (type 1): If Charles Brown dies the
corresponding tuples from Patient and Surgery
tables can be deleted without losing information
on David Rosen.
• Update: If John White comes in for third time, and
has moved, we only need to change the Patient
table
12/5/2000
Information Organization and Retrieval
2NF Storage Anomalies
• Insertion: Cannot enter the fact that a particular
drug has a particular side effect unless it is given
to a patient.
• Deletion: If John White receives some other drug
because of the penicillin rash, and a new drug and
side effect are entered, we lose the information
that penicillin can cause a rash
• Update: If drug side effects change (a new
formula) we have to update multiple occurrences
of side effects.
12/5/2000
Information Organization and Retrieval
Third Normal Form
• A relation is said to be in Third Normal
Form if there is no transitive functional
dependency between nonkey attributes
– When one nonkey attribute can be determined
with one or more nonkey attributes there is said
to be a transitive functional dependency.
• The side effect column in the Surgery table
is determined by the drug administered
– Side effect is transitively functionally
dependent on drug so Surgery is not 3NF
12/5/2000
Information Organization and Retrieval
Third Normal Form
Patient # Surgeon # Surgery Date
12/5/2000
Surgery
Drug Admin
1111
145
1111
311
01-Jan-95 Gallstones removal
Kidney stones
12-Jun-95 removal
1234
243
05-Apr-94 Eye Cataract removal Tetracycline
1234
467
10-May-95 Thrombosis removal
2345
189
08-Jan-96 Open Heart Surgery
Cephalosporin
4876
145
05-Nov-95 Cholecystectomy
Demicillin
5123
145
10-May-95 Gallstones Removal
none
6845
243
none
6845
243
15-Dec-84 Eye cataract removal
Eye Cornea
05-Apr-94 Replacement
Information Organization and Retrieval
Penicillin
none
none
Tetracycline
Third Normal Form
Drug Admin
12/5/2000
Side Effects
Cephalosporin
none
Demicillin
none
none
none
Penicillin
rash
Tetracycline
Fever
Information Organization and Retrieval
2NF Storage Anomalies
Removed
• Insertion: We can now enter the fact that a
particular drug has a particular side effect in
the Drug relation.
• Deletion: If John White recieves some other
drug as a result of the rash from penicillin,
but the information on penicillin and rash is
maintained.
• Update: The side effects for each drug
appear only once.
12/5/2000
Information Organization and Retrieval
Boyce-Codd Normal Form
• Most 3NF relations are also BCNF
relations.
• A 3NF relation is NOT in BCNF if:
– Candidate keys in the relation are composite
keys (they are not single attributes)
– There is more than one candidate key in the
relation, and
– The keys are not disjoint, that is, some
attributes in the keys are common
12/5/2000
Information Organization and Retrieval
Most 3NF Relations are also
BCNF
Patient #
1111
1234
2345
4876
5123
6845
12/5/2000
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
Information Organization and Retrieval
Fourth Normal Form
• Any relation is in Fourth Normal Form if it
is BCNF and any multivalued dependencies
are trivial
• Eliminate non-trivial multivalued
dependencies by projecting into simpler
tables
12/5/2000
Information Organization and Retrieval
Fifth Normal Form
• A relation is in 5NF if every join
dependency in the relation is implied by the
keys of the relation
• Implies that relations that have been
decomposed in previous NF can be
recombined via natural joins to recreate the
original relation.
12/5/2000
Information Organization and Retrieval
Relational Calculus
• Relational Algebra provides a set of explicit
operations (select, project, join, etc) that can be
used to build some desired relation from the
database.
• Relational Calculus provides a notation for
formulating the definition of that desired relation
in terms of the relations in the database without
explicitly stating the operations to be performed
• SQL is based on the relational calculus.
12/5/2000
Information Organization and Retrieval
Relational Algebra Operations
•
•
•
•
•
•
•
•
Select
Project
Product
Union
Intersect
Difference
Join
Divide
12/5/2000
Information Organization and Retrieval
Select
• Extracts specified tuples (rows) from a
specified relation (table).
12/5/2000
Information Organization and Retrieval
Project
• Extracts specified attributes(columns) from
a specified relation.
12/5/2000
Information Organization and Retrieval
Join
• Builds a relation from two specified
relations consisting of all possible
concatenated pairs of, one from each of the
two relations, such that in each pair the two
tuples satisfy some condition.
A1 B1
A2 B1
A3 B2
12/5/2000
B1 C1
B2 C2
B3 C3
(Natural
or Inner)
Join
Information Organization and Retrieval
A1 B1 C1
A2 B1 C1
A3 B2 C2
Outer Join
• Outer Joins are similar to PRODUCT -- but
will leave NULLs for any row in the first
table with no corresponding rows in the
second.
Outer
Join
A1
A2
A3
A4
12/5/2000
B1
B1
B2
B7
B1 C1
B2 C2
B3 C3
Information Organization and Retrieval
A1 B1 C1
A2 B1 C1
A3 B2 C2
A4 * *
SQL
• Structured Query Language
• SEQUEL from IBM San Jose
• ANSI 1992 Standard is the version used by
most DBMS today (SQL92)
• Basic language is standardized across
relational DBMSs. Each system may have
proprietary extensions to standard.
12/5/2000
Information Organization and Retrieval
SQL Uses
• Database Definition and Querying
– Can be used as an interactive query language
– Can be imbedded in programs
• Relational Calculus combines Select,
Project and Join operations in a single
command. SELECT.
12/5/2000
Information Organization and Retrieval
SELECT
• Syntax:
– SELECT [DISTINCT] attr1, attr2,…, attr3
FROM rel1 r1, rel2 r2,… rel3 r3 WHERE
condition1 {AND | OR} condition2 ORDER
BY attr1 [DESC], attr3 [DESC]
12/5/2000
Information Organization and Retrieval
SELECT Conditions
•
•
•
•
•
•
= equal to a particular value
>= greater than or equal to a particular value
> greater than a particular value
<= less than or equal to a particular value
<> not equal to a particular value
LIKE “*term*” (may be other wild cards in other
systems)
• IN (“opt1”, “opt2”,…,”optn”)
• BETWEEN val1 AND val2
• IS NULL
12/5/2000
Information Organization and Retrieval
Relational Algebra Selection
using SELECT
• Syntax:
– SELECT * WHERE condition1 {AND | OR}
condition2
12/5/2000
Information Organization and Retrieval
Relational Algebra Projection
using SELECT
• Syntax:
– SELECT [DISTINCT] attr1, attr2,…, attr3
FROM rel1 r1, rel2 r2,… rel3 r3
12/5/2000
Information Organization and Retrieval
Relational Algebra Join using
SELECT
• Syntax:
– SELECT * FROM rel1 r1, rel2 r2 WHERE
r1.linkattr = r2.linkattr
12/5/2000
Information Organization and Retrieval
Sorting
• SELECT BIOLIFE.[Common Name],
BIOLIFE.[Length (cm)]
FROM BIOLIFE
ORDER BY BIOLIFE.[Length (cm)]
DESC;
12/5/2000
Information Organization and Retrieval
Subqueries
• SELECT SITES.[Site Name],
SITES.[Destination no]
FROM SITES
WHERE sites.[Destination no] IN (SELECT
[Destination no] from DEST where [avg
temp (f)] >= 78);
• Can be used as a form of JOIN.
12/5/2000
Information Organization and Retrieval
Aggregate Functions
•
•
•
•
•
Count
Avg
SUM
MAX
MIN
12/5/2000
Information Organization and Retrieval
Using Aggregate functions
• SELECT attr1, Sum(attr2) AS name
FROM tab1, tab2 ...
GROUP BY attr1, attr3 HAVING condition;
12/5/2000
Information Organization and Retrieval
Using an Aggregate Function
• SELECT DIVECUST.Name, Sum([Price]*[qty]) AS
Total
FROM (DIVECUST INNER JOIN DIVEORDS ON
DIVECUST.[Customer No] = DIVEORDS.[Customer
No]) INNER JOIN DIVEITEM ON DIVEORDS.[Order
No] = DIVEITEM.[Order No]
GROUP BY DIVECUST.Name
HAVING (((DIVECUST.Name) Like "*Jazdzewski"));
12/5/2000
Information Organization and Retrieval
GROUP BY
• SELECT DEST.[Destination Name], Count(*) AS
Expr1
FROM DEST INNER JOIN DIVEORDS ON
DEST.[Destination Name] =
DIVEORDS.Destination
GROUP BY DEST.[Destination Name]
HAVING ((Count(*))>1);
• Provides a list of Destinations with the number of
orders going to that destination
12/5/2000
Information Organization and Retrieval
Create Table
• CREATE TABLE table-name (attr1 attrtype PRIMARYKEY, attr2 attrtype,…,attrN attr-type);
• Adds a new table with the specified
attributes (and types) to the database.
12/5/2000
Information Organization and Retrieval
Access Data Types
•
•
•
•
•
•
•
•
•
Numeric (1, 2, 4, 8 bytes, fixed or float)
Text (255 max)
Memo (64000 max)
Date/Time (8 bytes)
Currency (8 bytes, 15 digits + 4 digits decimal)
Autonumber (4 bytes)
Yes/No (1 bit)
OLE (limited only by disk space)
Hyperlinks (up to 64000 chars)
12/5/2000
Information Organization and Retrieval
Access Numeric types
• Byte
– Stores numbers from 0 to 255 (no fractions). 1 byte
• Integer
– Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes
• Long Integer
• Single
(Default)
– Stores numbers from –2,147,483,648 to 2,147,483,647 (no
fractions). 4 bytes
– Stores numbers from -3.402823E38 to –1.401298E–45 for
negative values and from 1.401298E–45 to 3.402823E38 for
positive values.
4 bytes
• Double
– Stores numbers from –1.79769313486231E308 to –
4.94065645841247E–324 for negative values and from
1.79769313486231E308 to 4.94065645841247E–324 for
positive values.
15
8 bytes
• Replication ID
– Globally unique identifier (GUID)
12/5/2000
Information Organization and Retrieval
N/A
16 bytes
Effectiveness and Efficiency
Issues for DBMS
• Focus on the relational model
• Any column in a relational database can be
searched for values.
• To improve efficiency indexes using storage
structures such as BTrees and Hashing are
used
• But many useful functions are not indexable
and require complete scans of the the
database
12/5/2000
Information Organization and Retrieval
Example: Text Fields
• In conventional RDBMS, when a text field
is indexed, only exact matching of the text
field contents (or Greater-than and Lessthan).
– Can search for individual words using pattern
matching, but a full scan is required.
• Text searching is still done best (and fastest)
by specialized text search programs (Search
Engines) that we will look at more later.
12/5/2000
Information Organization and Retrieval
Normalizing to death
• Normalization splits database information
across multiple tables.
• To retrieve complete information from a
normalized database, the JOIN operation
must be used.
• JOIN tends to be expensive in terms of
processing time, and very large joins are
very expensive.
12/5/2000
Information Organization and Retrieval
Advantages of RDBMS
• Possible to design complex data storage and
retrieval systems with ease (and without
conventional programming).
• Support for ACID transactions
–
–
–
–
12/5/2000
Atomic
Consistent
Independent
Durable
Information Organization and Retrieval
Advantages of RDBMS
• Support for very large databases
• Automatic optimization of searching (when
possible)
• RDBMS have a simple view of the
database that conforms to much of the data
used in businesses.
• Standard query language (SQL)
12/5/2000
Information Organization and Retrieval
Disadvantages of RDBMS
• Until recently, no support for complex
objects such as documents, video, images,
spatial or time-series data. (ORDBMS are
adding support these).
• Often poor support for storage of complex
objects. (Disassembling the car to park it in
the garage)
• Still no efficient and effective integrated
support for things like text searching within
fields.
12/5/2000
Information Organization and Retrieval