I. Terminology
Download
Report
Transcript I. Terminology
Chapter 12
File Processing and Data
Management Concepts
Presentation Outline
I. Terminology
II. Database Technology
III. The Architecture of a Database
Management System (DBMS)
IV. The Database Administrator
I. Terminology
A. Field
B. Data Occurrences
C. Fixed vs. Variable Length Records
D. Record Key
E. Sort Keys
A. Field
A field is the smallest block of data that will be
stored and retrieved in the information system.
Other names for field include data item, attribute,
or element.
Field 1
Field 2
B. Data Occurrences
A specific set of data values for a record in a file.
1
2
3
4
5
The above table contains 5 occurrences of account records for the
general ledger account file.
C. Fixed vs. Variable Length Records
Fixed Length Records
Both the number of fields
and the length of each field
are fixed.
Strength: Easier to
manipulate records.
Weakness: Must
accommodate maximum
sizes.
Variable Length Records
Both the number of fields
and the length of each field
are variable. (See Fig. 15-1
on p. 603)
Strength: Less waste of
memory when maximum
sizes do not have to be
accomodated.
Weakness: Record
manipulation is more
difficult.
D. Record Key
1110
1500
2105
2110
A record key is a field or combination of fields that
uniquely identifies a particular record in a file.
E. Sort Keys
Primary sort key – The first field used to sort the data
occurrences in a record set.
Secondary sort key – A field used to determine relative
position among a set of data occurrences in a record set.
Tertiary sort key – Additional fields beyond primary and
secondary sort keys that are required to uniquely identify data
occurrences in a record set.
Last Name
Adams
Jones
Jones
Jones
Young
First Name
Tom
Alisa
Julie
Julie
Sam
Age
25
36
19
21
22
II. Database Technology
A. The Problem of Redundancy
B. The Components of a Database
A. The Problem of Redundancy
That is not
what we show
per our
records.
Redundancy occurs
when different areas of
an organization use the
information system to
store the same
information in more
than one place.
Results in update
anomaly.
B. The Components of a Database
Management System
1. Data Description Language (DDL)
2. Data Manipulation Language (DML)
3. Data Query Language (DQL)
1. Data Description Language (DDL)
Defines the logical structure
of the database (known as
the schema). Defines the
following:
Name of data fields.
Type of data (numeric,
alphabetic, etc.)
Number of positions
(length of field).
May also define
subschema (i.e., individual
user views)
2. Data Manipulation Language (DML)
Pull a trial balance.
The DML consists of the
commands for updating,
editing, manipulating, and
extracting data.
Structured query language
(SQL) is a common DML
in relational settings.
Structure Query Language (SQL)
3. Data Query Language
Query by
Example (QBE)
A data query language is a user friendly language or
interface that allows the user to request information by
simply filling in blanks. Represents a special type of
DML.
III. The Architecture of a Database
Management System (DBMS)
A. The Database Architecture
B. The Conceptual Architecture and EntityRelationship (ER) Diagrams
C. Logical Data Structures
D. The Physical Structure
A. The Database Architecture
Conceptual
Level
Database contents
Uses of database
Desired reports
Information to be viewed
Logical Level
Logical data structures:
Tree
Network
Relational
Physical Level
Access Methods:
Sequential Access
Indexed Files
B. The Conceptual Architecture and EntityRelationship (ER) Diagrams
PART_NO
NAME
COST
PART
STORED AT
LOCATION
WHSE
ADDRESS
Square boxes are
used for entities
(separate tables).
Ellipses are used for
attributes (table
columns).
Diamond shaped
boxes depict
relationships.
C. Logical Data Structures
1. Tree or Hierarchical Structure
2. Network Structures
3. Relational Structure
a. Selection
b. Projection
c. Join
1. Tree or Hierarchical Structure
A parent record can have many children. However a child
record can have only one parent.
Can only model 1:1 (one-to-one) and 1:* (one-to-many)
relationships.
Commonly used with accounting data. Can only access data
by going from a parent to child.
Balance Sheet
Assets
Liabilities
Current Assets
Current Liabilities
Revenues
Long-term Assets
Long-term Liabilities
Expenses
Equity
2. Network Structure
Eliminates the distinction of parent and child
records. A parent can have many children and a
child can have many parents.
Can model 1:1 (one-to-one), 1:* (one-to-many),
and *:* (many-to-many) relationships.
Must know the physical structure of the data in
order to access it.
3. Relational Structure
Relational databases organize and store data
in two dimensional tables consisting of
rows and columns.
Relationships among tables are represented
by common data values in different tables.
Straight forward in terms of organizing and
searching the data.
Possesses ad hoc search capabilities.
3a. Selection
Produces a horizontal subset (includes entire row) of
a relation which satisfies a boolean predicate.
Name
John
Bill
Mary
Joe
Savings Table
Acct #
123
205
707
127
Balance
35.75
3.95
7.95
4.05
(Savings)
Balance < 5.00
Name
Bill
Joe
Acct #
205
127
Balance
3.95
4.05
3b. Projection
Constructs a vertical subset of a relation. The subset is
obtained by selecting specified attributes and removing
others.
Savings Table
Acct #
123
205
707
127
Name
John
Bill
Mary
Joe
Balance
35.75
3.95
7.95
4.05
(Savings)
Name
Balance < 5.00
Name
Bill
Joe
3c. Join
A join is used to combine 2 tables. The attribute
used to join must be in both tables.
Table R
A
B
C
a1
a2
a2
a4
c1
c2
c3
c2
b1
b2
b2
b2
R |X| S
Table S
C
c2
c3
c2
D
d1
d2
d1
E
e1
e3
e2
A
a2
B
b2
C
c2
D
d1
E
e1
a2
a2
a4
b2
b2
b2
c2
c3
c2
d1
d2
d1
e2
e3
e1
a4
b2
c2
d1
e2
D. The Physical Structure
1. Sequential Access
2. Indexed Files
1. Sequential Access
Records can only be
accessed in a predefined
sequence. For example,
if there are 100 records
in a file, one must access
the first 99 records
before accessing the last
record.
Generally useful for
batch processing when
nearly all records must
be accessed.
2. Indexed Files
Any attribute can be
extracted from the records in
a primary file and used to
build a new file whose
purpose is to provide an
index to the original file.
First, the index is searched to
find a specified value of an
attribute such as an customer
account number.
Second, the disk addresses
are used to directly retrieve
the desired records
See Fig. 12-13 on p. 427.
IV. The Database Administrator
This is not
quite what we
need.
The database
administrator is a
person who
coordinates data
management activities
such as approving the
physical contents and
user views of the
database.
Summary
Fields and keys
Three Components of a DBMS
Three Types of Database Architecture
The Database Administrator