+ 1 - web page for staff

Download Report

Transcript + 1 - web page for staff

File Concepts
This chapter devoted to the design of files and databases
for information systems. We begin this chapter with a
review of file terminology and concepts. Then, we
examine the different types of system files. Alter
discussing file access methods, we review the various
media used for system files and databases. Finally, we
examine each type of file organization, and then discuss
the advantages and disadvantages of each.
1
Terminology and concepts
In your previous programming courses, you
probably created and accessed many files, so you are
already familiar with most, if not all, of the terminology
and concepts that relate to files. This and the next chapter
are based on a basic understanding of system files, so
let’s take time first to review file terminology and
concepts.
2
• File components
The smallest amount of data that can be stored is
the bit. Bits are grouped into bytes (or characters).
Collections of bytes form a field. A field is an individual
element of data or a fact about a person, place, thing, or
event. A field is also called a data element or a data item.
A file is a collection of logical records and contains
data about an information system entity. An entity is a
person, place, thing, or event for which data is collected.
For example, PRODUCT is an entity in an inventory
system.
3
ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT
Data Hierarchy in a Computer System
4
file
A file consists of blocks, which consist of logical records, which consist
of related fields.
5
• Entity: Person, place, thing, event about which
information is maintained
• Attribute: Description of a particular entity
• Key field: Identifier field used to retrieve, update,
sort a record
6
Entitities and Attributes
7
• Types of files
An IS uses one or more of each of the following
types of files:
• Master files
• Table files
• Transaction files
• Work files
• Security files
• History files
• Key fields
• Primary keys A primary key is the field or
combination of fields in a master file or table file
8
that minimally and uniquely identifies a
particular entity.
• Candidate keys Sometimes you do have a
choice of two or more fields or field combinations
for the primary key. Any field that could serve as the
primary key is called a candidate key. Only one of
the candidate keys is designated as the primary key;
typically, you select the field that is smallest and
easiest with which to work.
• Foreign keys A foreign key is a field or
combination of fields in a file that must match a
9
primary key value in some other file, thereby
establishing a relationship between the two files.
Foreign keys are found in all types of files.
• Secondary keys
• Data storage
Data can be stored in five basic data storage
formats: EBCDIC, ASCII, packed decimal, binary, and
floating point. Floating point, which is an inexact
representation of a numeric value, is used primarily in
scientific applications. Business systems analysts are
most concerned with the other four.
10
11
Primary Key
12
Foreign Key
13
Foreign Key
14
Access methods
File access concerns how the system’s programs
access (read or write) the file records: sequentially or
randomly.
File media summary
The media usually used for system files are
magnetic tape, magnetic disks and diskettes, mass
storage devices, and optical disks.
File organization
The three file organization techniques available are
sequential, direct, and indexed.
15
Problems with the Traditional File Environment
•
•
•
•
•
Data redundancy
Program-Data dependence
Lack of flexibility
Poor security
Lack of data-sharing and availability
16
File and Database design
Many of your file and database design efforts are
actually completed during the systems analysis phase.
During systems analysis, you identify and describe all
the data elements in the information system, create data
flow diagrams, designate data stores, assign the data
elements to those data stores, and normalize the data
store designs. During the systems design phase, you
evaluate and refine your data store designs.
17
Entity-relationship diagrams
We have previously defined an entity as a person,
place, thing, or event for which data is collected. A
relationship is a logical association between entities. A
relationship exists between the entities PRODUCT and
WAREHOUSE, for example, because products are
stored in warehouses. The logical relationships between
entities in an IS are graphically represented in an entityrelationship diagram (ERD). The basic format of an ERD
for two related entities.
18
19
ENTITY 1
Example
DOCTOR
IS
RELATED
TO
ENTITY 2
A patient
is treated by
a doctor
TREATS
A doctor
treats a
patient
PATIENT
20
Entities ...
ENTITY NAME
- entity id
- attribute 1
- attribute 2
- …………..
- attribute n
CUSTOMER
- Customer_ID
- Cust_Name
- Cust_Address
- Cust_Phone
21
Cardinality
Cardinality specifies how many instances of an entity
relate to one instance of another entity. Ordinality is also
closely linked to cardinality. While cardinality specifies
the occurances of a relationship, ordinality describes the
relationship as either mandatory or optional. In other
words, cardinality specifies the maximum number of
relationships and ordinality specifies the absolute
minimum number of relationships. When the minimum
number is zero, the relationship is usually called
optional and when the minimum number is one or more,
the relationship is usually called mandatory.
22
23
24
25
26
Three types of relationships can exist between
entities. A one-to-one relationship, abbreviated as 1:1,
exists when there is exactly one of the second entity for
each of the first entity.
OFFICE
MANAGER
1
HEADS
1 BRANCH
OFFICE
A one-to-many relationship, abbreviated as 1:M,
DEPARTMENT 1
EMPLOYS
M EMPLOYEE
27
One-to-One Relationships
28
One-to-Many Relationships
29
A many-to-many relationship, abbreviated as M:N,
ORDER
M
LISTS
N
PRODUCT
CUSTOMER
M
BOUGHT
N
PRODUCT
STUDENT
M
TAKES
N
COURSES
ORDER
M
LIST
N
PRODUCT
30
Many-to-Many Relationships
31
CUSTOMER
1:1
PLACES
0:M
ORDER
0:M
CONTAINS
1:M
Ordinality and Cardinality
PRODUCT
32
Normalization
The normalization process involves three types of
normal forms; first normal form, second normal form,
and third normal form.
STUDENT
Unnormalized STUDENT records
33
First normal form
•
STUDENT
1NF STUDENT records
This STUDENT record design can be written as:
STUDENT (STUDENT-NUMBER, STUDENT-NAME, TOTAL-CREDIT, GPA,
ADVISOR-NUMBER, ADVISOR-NAME, COURSE-NUMBER, COURSE-DECS,
GRADE)
34
• Second normal form
STUDENT
2NF STUDENT, COURSE,
and GRADE records
35
• Third normal form
3NF STUDENT, ADVISOR,COURSE,
and GRADE records
36
Our final 3NF design for all the entity records is:
STUDENT (STUDENT-NUMBER, STUDENT-NAME,
TOTAL-CREDIT, GPA, ADVISOR-NUMBER)
ADVISOR (ADVISOR-NUMBER, ADVISOR-NAME)
COURSE (COURSE-NUMBER, COURSE-DECS,
NUM-CREDIT)
GRADE (STUDENT-NUMBER, COURSE-NUMBER,
GRADE)
37
Building an Entity-Relationship Diagram (ERD)
• Steps in building ERDs
• Identify the entities
• Add attributes
• Identify relationship
Identify the entities
• Identify major categories of information
• Check the DFD for data stores, external entities,
and data flows
• Verify that there is more than one instance of the
38
Add attributes
• Identify attributes of the entity that are relevant to
the system under development
• Check the DFD for details on data stores,
and data flows
• Check the data requirements of the user
requirements
• Interview knowledgeable users
• Perform document analysis on existing forms and
reports
39
Identify relationships
• Start with an entity and identify all entities with
which it shares relationships
• Describe the relationship with the appropriate verb
phrase
• Determine the cardinality by discussing the business
rules with knowledgeable users
ERD Building Tips
• Data stores of the DFD should correspond to entities
• Only include entities with more that one instance of
information
40
SALES
1
SERVES
M CUSTOMER
1
PLACES
M
ORDER
M
LISTS
M
WAREHOUSE
STORES
N
N
PRODUCT
41
ADVISOR
1
ADVISES
M
STUDENT
1
RECEIVES
COURSE
1
GIVES
M
M
GRADE
The ERD for STUDENT, ADVISOR, COURSE, and GRADE
after normalization
42
name
category
name
stockprice
price
Product
buys
address
makes
Company
Person
employs
name
ssn
43
44
An Unnormalized Relation of ORDER
Figure 7-11
45
An Normalized Relation of ORDER
Figure 7-12
46
การเปลี่ยนแปลงความสัมพันธ์แบบ M:N มาเป็ น 1:M โดยใช้
Composite Entity สัญลักษณ์ของ Composite Entity คือ
สี่ เหลี่ยมผืนผ้าซ้อนด้วยสี่ เหลี่ยมรู ปข้าวหลามตัด
Studen
Studen
M
M
1
NM
Learning
M
Learning
M
1
Subject
Subject
47
Studen
Studen
M
1
M
Learning
M
Learning
รหัส
45001
ชื่ อ
ปัญญา
คณะ
วิทยาศาสตร์
45002
วิชา
วิศวกรรมศาสตร์
Regis_Detail(Weak)
รหัส
45001 01
45001 02
45002 01
45002 02
M
1
รหัสวิชา
01
02
Subject
Subject
ชื่ อวิชา
C Programming
Database
รหัสวิชา
เกรด
A
B+
C+
B
48
ใบสัง่ ซื้ อ
M
ใบสัง่ ซื้ อ
1
สั่งรายการ
M
รายการสั่ งซื้อ
M
N
สิ นค้า
1
สิ นค้า
ภาพแสดงการเปลี่ยนแปลงจาก M:N มาเป็ น 1:M
49
วันที่สงั่
เลขที่ใบสัง่
ใบสัง่ ซื้ อ
ชื่อสิ นค้า
วันที่ส่ง
1
รหัสสิ นค้า
M
รายการสั่ งซื้อ
ใบสั่งซื้ อ
เลขที่ใบสั่ง วันที่สั่ง
รายการสั่งซื้ อ
เลขที่ใบสั่ง รหัสสิ นค้า
M
1
ราคา
สิ นค้า
สิ นค้า
วันที่ส่ง
รหัสสิ นค้า ชื่อสิ นค้า
ราคา
จานวน
50
Database
In a typical file processing environment, each end
user department has its own information system, and
each information system has its own collection of files.
Two potential problems might occur with this
environment. First, data redundancy, in which data
common to two or more information systems is
duplicated in multiple files, is possible. Data redundancy
obviously results in using extra storage space.
The second problem with the typical file
processing environment concerns management decision
support. A top-level manager might need to relate
51
information from more than one end user department;
this requires accessing independent information systems,
which might be an awkward and inefficient process at
best.
Database technology presents a solution to these
problems. We define a database as a structure that can
store data relating to multiple entities, as well as
relationships among those entities.
• Database management systems
A database management system (DBMS) is a
software system used to create, access, and control the
database. The DBMS serves as a bridge, or interface,
52
Payroll
system
Project
Control
system
General
Ledger
system
Database
Management
system
Personnel
system
Budget
Analysis
system
Database
The database environment in which multiple information system access a
single, integrated database.
53
between the database and the application programs,
systems analysts, and end users of the database.
A DBMS includes a data definition language, a
data manipulation language, a query language, a data
dictionary, and utility services.
• Data definition language
A data definition language (DDL) is used to
describe the structure of the database. The complete
definition of the database, which includes descriptions of
all fields, records, and relationships, is called the schema.
You also use the DDL to define one or more subschema
for the database. A subschema is a view of the database
54
used by one or more programs or end users. A
subschema defines only those portions of the database
that a particular program or end user needs or is allowed
to access.
• Data manipulation language
A data manipulation language (DML) provides the
commands necessary for storing, retrieving, updating,
and deleting database records. DML commands can be
embedded in applications programs written in host
languages, which are procedural programming languages
such as COBOL, BASIC, and PL/1.
55
• Query language
A query language is a nonprocedural language
used to access a database.
• QBE (Query-By-Example)
• SQL (Structured Query Language)
• Data dictionary
The data dictionary in a DBMS is the central
storehouse of information about the database. The
schema and all subschemas are stored in the data
dictionary.
56
• Database models
The four basic models for database organization
are hierarchical, network, relational, and object-oriented.
The hierarchical model is the oldest of these models.
Next came the network model and then the relational
model. The object-oriented model is the newest of the
four. All new DBMS over the past few years have been
either relational or object-oriented DBMS.
• Relational database
In a relational database, data is organized in
two-dimensional tables called relations. Each row in a
table is called a tuple and each column is called an
57
attribute. You can think of a relation or table as a file,
tuples or rows as records, and attributes or columns as
fields.
Examples of relational database programs:
–Microsoft Access
–Microsoft® SQL Server™
–Oracle
–DB2
–FoxPro
58
• Object-oriented databases
In the last several years, a new approach to the
system development process has emerged: the objectoriented approach. In answer to this new approach to
system development, object-oriented DBMSs are now
available. Man of these current DBMSs are actually
hybrids that combine object-oriented features within the
framework of the relational model.
59
PERSON OBJECT
Number:
ID number
Name:
Name
DEPARTMENT OBJECT
Deptnum:
Department number
Deptname:
Department name
FACULTY:
FACULTY OBJECT; MV
(all the faculty members within a given department)
MAJOR:
MAJOR OBJECT; MV
(all the majors for a given department)
FACULTY OBJECT
PERSON:
PERSON OBJECT
STUDENT:
STUDENT OBJECT; MV
(all advisees of a given faculty member)
DEPARTMENT:
DEPARTMENT OBJECT; SUBJECT [Deptnum]
(the department to which the faculty member belongs)
STUDENT OBJECT
PERSON:
PERSON OBJECT
Totcred:
Total credits
GPA:
GPA
MAJOR:
MAJOR OBJECT; SUBSET [Majcode]
(the student’s major)
FACULTY:
FACULTY OBJECT; SUBSET [Number, Name]
(the name and number of the student’s faculty advisor)
MAJOR OBJECT
Majcode:
Major code
Majdesc:
Major description
STUDENT:
STUDENT OBJECT; MV
(all students with a given major)
DEPARTMENT:
DEPARTMENT; SUBSET [Deptnum]
(the department to which the major belong)
60
Advantages of the database approach
1. Economy of scale.
2. Sharing of data
3. Balancing conflicting requirement
4. Enforcement of standards
5. Controlled redundancy
6. Security
7. Increased programmer productivity
8. Data independence
61
Disadvantages of the database approach
1. Size.
2. Complexity.
3. Cost.
4. Additional hardware requirements
5. Higher impact of a failure
6. Performance compromises
62
Designing files and database
The following four analysis and design steps
constitute a systematic method for the creation of file and
database designs. To illustrate the steps, we will use as
an example a microcomputer information system for an
independent video rental store.
1. Create the initial ERD. Identify all the data
stores in your DFD as system entities, and create a rough
draft of a complete ERD for the system.
MEMBER
M
RENTS
N
VIDEO
63
MEMBER (MEMBER-NUMBER, NAME, ADDRESS,
CITY, STATE, ZIP, HOME-PHONE,
WORK-PHONE, CREDIT-CARD-CODE,
CREDIT-CARD-NUMBER, (VIDEO-ID,
TITLE, DATE-RENTED,
DATE-REURNED))
(VIDEO (VIDEO-ID, TITLE))
2. Assign all data elements to entities
3. Normalize entities and create the final ERD
4. Verify all data dictionary entities. Verify that
the data dictionary entries for all data stores, records, and
data elements are completely and correctly documented.
64
MEMBER
VIDEO
1
1
HAS
MAKES
M
RENTAL
M
MEMBER (MEMBER-NUMBER, NAME, ADDRESS, CITY, STATE, ZIP,
HOME-PHONE, WORK-PHONE, CREDIT-CARD-CODE,
CREDIT-CARD-NUMBER)
(VIDEO (VIDEO-ID, TITLE)
RENTAL (MEMBER-NUMBER, VIDEO-ID, DATE-RENTED, DATE-REURNED)
65
Calculating file sizes and volumes
One of the final steps in the physical design of
a file is estimating the file size. You need accurate
estimates of file sizes to verify that adequate
physical storage capacity is available for the system
files or to determine how much additional storage
capacity must be provided. You should plan for
reasonable growth when sizing a file.
• Sizing diskette files
• Sizing tape files
• Sizing Disk files
66
File and database control
We will discuss additional control measures for
protecting the data stores in your files and databases. Most
database management systems provide extensive control and
security features, including passwording, encryption,
subschemas, audit trails, and backup and recovery procedures.
Your responsibility as systems analyst is to ensure that these
features are used.
Limiting access to files and databases is the most
common measure for protecting stored data. Only those end
uses who furnish an appropriate access code should be allowed
to access a files or database. Different privileges can be
associated with different access codes, so some end uses can
67
be denied access to certain data, others can be limited to a
read-only access, still others might be all allowed to both read
and update the data, and only a very few end users are allowed
full privileges to read, write, update, and delete data. For
sensitive data, additional access codes can be established at the
record or field level. Stored data can also be encrypted, so the
data can be interpreted only by special decoding software;
data would be decrypted for legitimate users, but anyone who
manages to circumvent the system programs to access the file
or database would be unable to decipher the data.
68
69
70
Converting an E-R diagram to a logical structure
PROJECT
PROJECT-NO
START-DATE
BUDGET
MAKE
PROJECT-NO
ORDER_NO
ORDERS
ORDER_NO
ORDER-DATE
FOR
ORDER_NO
PART-NO
QTY-ORDERED
PARTS
PART-NO
COLOR
WEIGHT
PROJECT-NO
START-DATE
BUDGET
PROJECTS
PROJECT-NO
PROJECT-NO
ORDER-NO
ORDER-DATE
ORDERS
ORDER-NO
ORDER-NO
PART-NO
FOR
QTY-ORDERED
PART-NO
PART-NO
PARTS
COLOR
WEIGHT
71
PERSONS
WORK
PERSON-ID
NAME
ADDRESS
PERSON-ID
PROJECT-ID
TIME-SPENT
PROJECT-ID
PROJECTS START-DATE
BUDGET
To convert
an E-R diagram to
a set of relations
PERSONS
PERSON-ID
PX1
PX2
PZ5
NAME
Jackson
Maine
Oldham
ADDRESS
London
Liverpool
London
WORK
PERSON-ID PROJECT-ID TIME-SPENT
PX2
Proj3
30
PX1
Proj2
15
PZ5
Proj2
40
PX2
Proj5
30
PZ5
Proj3
75
PROJECTS
PROJECT-ID
Proj3
Proj2
Proj5
START-DATE BUDGET
1 March 86
50
15 February 86
30
1 November 85
60
72
SUPPLY
SUPLIER
PROJECT
PART
SUPLIER
SNAME
……
PROJECT
PROJNAME
……
PART
PARTNO
SUPPLY
SNAME PROJNAME PARTNO QUANTITY
……
73
74
A Logical Data Structure for Input Requirements
ORDER =
ORDER NUMBER
+ ORDER DATE
+ CUSTOMER NUMBER
+ CUSTOMER NAME
+ CUSTOMER SHIPPING ADDRESS = ADDRESS >
+ ( CUSTOMER BILLING ADDRESS = ADDRESS > )
+ 1 { PRODUCT NUMBER +
QUANTITY ORDERED } n
+ ( DEFAULT CREDIT CARD NUMBER )
ADDRESS
=
+
+
+
+
( POST OFFICE BOX NUMBER )
STREET ADDRESS
CITY
STATE
POSTAL ZONE
75
Software Design and Completing the
System design phase
In this chapter, we discuss the processing methods that
influence software design, the major processing and
support functions that we must accommodate in our
software design, and software design itself. We conclude
the chapter by describing the completion of the systems
design phase; the preparation of the system design
specification document and the presentations of the
systems design to management and to information
systems department personnel.
76
Processing methods
Each information system functions in a specific
environment or uses a specific platform. An
environment, or platform, is the combination of hardware
and system software that is used for developed
information system. One IS, for example, might function
only in a PC and DOS environment, whereas another IS
might require a DEC and UNIX platform. Different
processing methods are possible within each
environment.
77
• Online processing
An online processing system is an IS in which
transactions are processed when and where they occur
and which allows output directly to end users. Online
transaction processing and interactive processing are
other terms for online processing. The online processing
system then stores the transaction in the appropriate
database or online file. An online file is a file with direct
or indexed organization that allows random access.
• Batch processing
A batch processing system is an IS in which data is
collected and, at some later time, all the data that has
been gathered is processed as a group, or batch.
78
• Combined online and batch processing
Online and batch processing can be combined in a single
application.
• Online and batch processing advantages and
disadvantages
Online processing systems have two primary advantages
over batch processing systems. First, because data is
entered and validated as it occurs, the stored data is
available sooner in a more accurate form. The second
advantage follows from the first: the stored data is up-todate at all times.
79
• Centralized and distributed processing
• Wide Area Networks (WANs)
• Local Area Networks (LANs)
• Bus network
• Star network
• Ring network
• Client/server architecture
• Single-user and multi-user processing
• single program operating systems
• multitasking operating systems
• multiprogramming operating systems
80
Major processing functions
• Data input and validating
• Updating or file maintenance
• Sorting
• Reporting
Processing support functions
• Backup and recovery
Backup refers to making copies of data files so that if
data is lost or destroyed, a timely recovery can be made
and processing can continue. Backups should be made as
soon as possible, usually immediately after the master
file has been updated.
81
• File retention
File retention refers to the length of time a file needs
to be retained before the space it occupies can be used
for another purpose.
• Restart
When an error occurs while a program is executing,
the first step is to correct the error. If a disk master file is
destroyed due to physical damage to the disk, for
example, then the first step is to replace the disk and
recover the master file from the latest backup. After the
correction is made, the program must be restarted. Part of
your systems design must include specifications of how
82
programs should be restarted, depending on where the
error occurred.
• Start-up processing
Start-up processing is the special processing that
occurs in making the transition from the current
information system to the new information system. The
primary requirement is the creation of the new system
master files from existing data.
Software design
Software design consists of two distinct levels.
The first level of software design is the systems
analyst’s responsibility and consists of determining
83
which program are required and what each program will
do.
The second level of software design is the
programmer’s responsibility and consists of designing
exactly how each program will accomplish what it must
do. The first level is completed during the system design
phase, whereas the second level is done during the
system development phase.
• Programs required
During the systems design phase, you must partition the
information system’s software functions into programs.
Your starting point, however, is to review all the process
84
descriptions developed during the system analysis phase
and extract those processes that must be handled by
application software programs. The general guidelines
that will help you to determine what programs are
needed.
- One update program for each master and table file.
- One report program for each report.
- One or more menu programs for online processing
control
•Program documentation
After deciding which programs are required, what
documentation do you need for each program? You must
provide the following information for each program.
85
1. Program identification
2. Purpose of the program.
3. Files. Each of the files that is input, output, printed
by the program, and updated by the program must
be identified by name.
4. Processing requirements.
Object-oriented design
The object-oriented analysis phase, in contrast, utilizes a
bottom-up approach that places emphasis on data rather
than data flows. It is during the design phase of the
object-oriented approach that traditional tools, such as
data flow diagrams (DFDs) and entity-relationship
diagrams (ERDs), are most useful.
86
Object-oriented design (cont.)
A UML class (ER term : entity) is any “thing” in the
enterprise that is to be represented in our database. It
could be a physical “thing” or simply a fact about the
enterprise or an event that happens in the real world. The
object-oriented analysis phase, in contrast, utilizes a
Customer
cFirstName
cLastName
cPhone
cStreet
cZipCode
UML class name
attribute names
UML operations or methods
(not needed here)
87
System design completion
The preparation of the system design specifications and
the presentations to management and to information
systems personnel are the final activities that complete
the systems design phase.
• System design specification
The system design specification is document that
presents the complete systems design for the new
information system, along with detailed costs, staffing,
and scheduling for completing the next SDLC phase,
system development. The system design specification is
also called the technical design specification and the
detailed design specification
88
SYSTEM DESIGN SPECIFICATION
1 Management Summary
2. System Components Details
a. Program Design
b. Output Design
c. Input Design
d. File and Database Design
e. Support Processing Design
3. Environment Requirements
4. Implementation Environments
5. Time and Cost Estimates
6. Appendices (as needed)
89
• Approval
End users must review and approve all report and output
screen designs, menu and data entry screen design,
source documents, manual processing, and other parts of
the systems design with which they will eventually
interact. The review and approval process is a continuous
process during the systems design phase. When you
complete the design for a report or a group of reports that
will be used by end users, for example, you should meet
with him or her to review the designed material or
prototypes, adjust the design if necessary to satisfy the
end user’s requirements, and obtain written approval of
the design.
90
• Technical and management presentations
You will usually give two or three presentations at the
end of the system design phase. The first presentation,
which at times you do not need to give, is an internal
departmental presentation to the system analysts,
programmers, computer operators, and technical support
staff who will be involved in future project phases or
who will have to support the system when it becomes
operational.
91
You give the second presentation to the top
management of the information systems department and
the end users.
The final presentation is given to company
management.
Base on the presentation and all the data submitted,
management might reach one of three decisions:
proceed on to the system development phase, perform
additional work on the systems design phase, or
terminate the project.
92