Transcript Class #6

Introduction to Information
Systems Analysis
Database Design & Output Design
INFO 503
Glenn Booker
INFO 503
Lecture #6
1
Database Design
• Information systems Create, Read, Update,
and Delete data (CRUD)
• Data is stored in records which describe one
instance of an entity (person, order, etc.)
• Each file contains many similar records
• A database contains collections of
interrelated files
INFO 503
Lecture #6
2
Conventional Files Pros & Cons
• Conventional (non-relational or flat) files
are used in many legacy systems;
e.g. Cobol and FileMaker Pro use them
• Fast and easy to develop
• Generally used for one specific application
• Results in duplication of data across files,
causing maintenance and integrity risks
INFO 503
Lecture #6
3
Relational Database Pros & Cons
• Provides more flexible format for storage
• Allows multiple applications to access data
• Slightly slower than conventional files,
but improving
• Higher learning curve to overcome
• Manage using a database management
system (DBMS)
INFO 503
Lecture #6
4
Field Types
• A field is the smallest useful piece of data
• Four types:
–
–
–
–
Primary key - unique identifier
Alternate key - alternative to the primary key
Foreign key - links to a primary key elsewhere
Descriptive field – every attribute not a key!
• Every field should be used at least once
INFO 503
Lecture #6
5
Records
• A record is round piece of bumpy vinyl
which rotates on a turntable … oh, wait
a minute *blush*
• A record is a collection of fields in a
specific order or format
• Each record is generally fixed in length
– Some systems can compress unused fields
– COBOL can use variable length records
INFO 503
Lecture #6
6
Records
• Records are read in units called blocks
or pages
– A block is sometimes called a physical page
– The blocking factor describes how many
logical records are in each block
– The blocking factor may be set manually for
performance reasons
INFO 503
Lecture #6
7
Files and Tables
• A file contains all records of a
particular structure
• A table is a relational database’s file
• There are many many kinds of files (see
next slide)
• File organization and access may be tuned
for performance enhancement
INFO 503
Lecture #6
8
Types of Files
–
–
–
–
Master files contain permanent information
Transaction files describe business events
Document files contain copies of historic data
Archival files contain master and transaction
file records offline which are no longer needed
– Table look-up files contain static data for
reference calculations (tax tables, Zip codes)
– Audit files record changes to other files
INFO 503
Lecture #6
9
Data Architecture
• An organization’s data architecture includes
– The files, databases, and database technology
used to store their data, and
– The administrative structure used to manage the
data resources
• The architecture may include legacy
information systems, local department
databases, and shared data warehouses
INFO 503
Lecture #6
10
Database Management
• Database administrators (DBA’s) try to
plan, define, and structure this mess, so
people can use it
• Database management systems (DBMS)
are the tools used to create and manage
a database
– The heart of the DBMS is called the database
engine, which controls access to the data
INFO 503
Lecture #6
11
SQL
• Structured Query Language (SQL) is used for both
data definition (DDL) and manipulation (DML)
– DDL creates and allows views of the data structures
– DML is used to access and use data from those
structures (CRUD)
• SQL is used by nearly all relational databases
– Oracle, DB2, SQL Server, Sybase, Informix, Access,
FoxPro, Paradox, dBase, etc.
INFO 503
Lecture #6
12
Relational Database Tools
• A relational database models the data in
a schema
• Tables can embed programs
– Triggers are used to run programs based
on some event - e.g. updating records
in another table
– Stored procedures are called by an application e.g. validate new records before they’re stored
INFO 503
Lecture #6
13
Data Modeling
• A good data model should be:
– Simple, describing only one entity
– Non-redundant, except for foreign keys
– Flexible, for future improvement
• Normalization helps achieve all of these
INFO 503
Lecture #6
14
Flat File Design
• Design of conventional flat files is dictated
primarily by the output and input needs of a
particular application
• We won’t say any more, in order to
discourage such barbaric practices :)
INFO 503
Lecture #6
15
Database Design
• Detailed design of databases is heavily
dependent on the DBMS used to
implement them
• CASE tools can often generate SQL, based
on the database model, which will even
generate the tables and relationships needed
• Otherwise you have to do it manually
INFO 503
Lecture #6
16
Database Design
• Each entity is implemented as a table
– Primary and alternate (secondary) keys are
generally indexed, to make searches faster
– Foreign keys are marked as such
– Attributes become fields
• Supertypes and subtypes may not be
implemented as such by CASE tools; may
make them one table w/ stored procedures
INFO 503
Lecture #6
17
Database Design
• Keys and other fields need to be defined
in detail:
– Data type (from list of allowable types in this
particular DBMS)
– Field size (length or number of characters)
– NULL allowable (no entry)
• Can’t allow NULL for a primary key
– Domain and default values
INFO 503
Lecture #6
18
Integrity
• Integrity refers to knowing that the data
is trustworthy, correct and can be
predictably found
• Three kinds of integrity:
– Key integrity
– Domain integrity
– Referential integrity
INFO 503
Lecture #6
19
Key Integrity
• Key integrity can be achieved when every
table has a unique, non-NULL primary key
– Primary key must be non-NULL because
there must be an entry for every record
• If the DBMS does not directly support these
features, then other controls must be taken
to enforce them
INFO 503
Lecture #6
20
Domain Integrity
• Domain integrity is achieved by ensuring
every field is within its allowable domain
(legal values)
– e.g. don’t allow a 7-digit credit card number
• Note that this does not preclude
incorrect values, only those values
which are impossible
INFO 503
Lecture #6
21
Referential Integrity
• Referential integrity is violated when a
foreign key’s value does not match up to
a primary key record
• To prevent this, must consider side effects
of deleting any record which contains a
foreign key
– Might have to wait for all of a customer’s
invoices are deleted before deleting customer
INFO 503
Lecture #6
22
Referential Integrity
• Rules for deletion should exist for every
table to help maintain referential integrity
– No restriction; delete freely
– Delete:Cascade when you need to delete all
associated records from the foreign keys
– Delete:Restrict when you hold off deleting
records until foreign key records are deleted
– Delete:Set null when foreign keys are set null
INFO 503
Lecture #6
23
Role Names
• Some organizations prefer to have all
unique field names, to prevent confusion
• When naming foreign keys, each one can
get a role name to distinguish its role
differently from the primary key’s name
– Primary key product_number in the Products
table may become foreign key
ordered_product_number in the Orders table
INFO 503
Lecture #6
24
Database Distribution
• Four options for database distribution:
– Centralization; there is only one data server
– Horiz. Distrib.; tables or records (rows) are
assigned different locations, but hard to analyze
– Vert. Distrib.; attributes (columns) are assigned
different locations, also hard to analyze
– Replication; duplicate entire tables at different
locations, but more complex and expensive
INFO 503
Lecture #6
25
Database Capacity Planning
• The size of a database can be estimated by:
– Record size: In a table, add the field sizes to get
the size of one record (one character=1 byte)
– Table size: Multiple the record size by the
number of records expected in 3 yrs
– Database size: Repeat first two steps for all
tables and add them up
– Add a big factor for unexpected growth
INFO 503
Lecture #6
26
The Database Future
• Object-oriented databases have become
more common in the last few years
– Some purely object-oriented databases are
available, but rarely used
• Since many organizations are heavily
invested in relational databases (20+ years),
object-relational databases are a popular
hybrid transition technology (e.g. Oracle 9)
INFO 503
Lecture #6
27
Output Design
• An information system generally has both
internal and external outputs
– Internal outputs are used for management of
the system and for reference by the users; they
rarely leave the organization
– External outputs include any printed or
displayed information which is used outside
of the system and its organization
INFO 503
Lecture #6
28
Internal Outputs
• Detailed reports are those which use little
filtering to produce a comprehensive
statement, such as all inventory on hand, or
every transaction during some time period
– Some detailed reports are historical or
regulatory in nature
INFO 503
Lecture #6
29
Internal Outputs
• Summary reports provide a synopsis
(maybe graphic) of the information, often
for managers to look for trends or problems
• Exception reports describe when something
is wrong – outside of predefined limits
(low stock)
INFO 503
Lecture #6
30
External Outputs
• External outputs leave the system and its
immediate users, generally to go to a
vendor, customer, or external system
– Might include invoices, paychecks, tickets,
passes, bills, purchase orders, etc.
• Turnaround outputs are external outputs
which later become inputs for the same
system (invoices returned by customer)
INFO 503
Lecture #6
31
Output Media for Implementation
• Media used for outputs may include
–
–
–
–
–
–
Paper (preprinted or not)
Microfilm (?)
Magnetic media (e.g. CD-ROM, DVD, video)
Text files (e.g. append log entries)
Posted online (Internet or intranet)
Displayed on screen
• Client screen, projector, or POS terminal
INFO 503
Lecture #6
32
Output Media for Implementation
• Consider different options; not just how
you first think of the output being presented
– Automated phone call to a pager or cell phone
– E-mail messages
– Multimedia (sound, pictures, video, etc.)
• Flash or Shockwave presentations
– Hyperlinks, or even entire web pages
INFO 503
Lecture #6
33
Output Media
• Printed output is often tabular and/or zoned
– Tabular format looks like a table (rows and
columns of text data)
– Zoned output is closer to a GUI input, or
the information at the top of a spreadsheet;
collections of related data are grouped together
– Tabular and zoned output formats are
often used together
• On-screen output may be graphic (charts)
INFO 503
Lecture #6
34
Graphic Output
• Graphic output uses pictures to express
information and help look for trends, but
can’t replace narrative output to explain
what is being shown
– Need to be clear about the scope and source
of the data shown in a graph, so it is used
appropriately by its audience
– Might have a hyperlink for definitions of the
terms used
INFO 503
Lecture #6
35
Graphic Output
• For every graphic output, need to consider
– How often will the output be generated?
– Does the output cover one moment in time or
many repeated assessments?
– How many data points are to be displayed?
– What medium will be used to present and
distribute the output?
• A 32-bit color plot is really boring after it has been
re-copied in grayscale and faxed three times
INFO 503
Lecture #6
36
Graphic Output – Chart Types
• Bar charts use horizontal bars, primarily for
visual comparison rather than tracking
• Column charts use vertical bars to compare
different items or track data over time
• Pie charts are good for showing a limited
number of data points at one moment
INFO 503
Lecture #6
37
Graphic Output – Chart Types
• Line charts show trends over time for
one or more variables
• Scatter charts are used to compare two
parameters to each other; often a correlation
between them, or relationship, is sought
• Other types exist, but are used less often
– Donut, area, radar, and control charts
INFO 503
Lecture #6
38
Output Design Tools
• Printer spacing charts were used to design
text-based outputs
• Now CASE tools and most database
environments (Access, Crystal Reports,
Oracle Designer) allow easy drag-and-drop
layout of outputs
INFO 503
Lecture #6
39
Design Considerations
• Output should be easy to read and interpret
– Every output should have a title
– Every output should be date and time stamped
– Section headings should be used to identify
groups of data
– In forms, fields should be clearly labeled
– Columns in tabular outputs need
column headings
INFO 503
Lecture #6
40
Design Considerations
– Legends should be used to explain column
headings and field names
– Only required information should be shown
(hide irrelevant details)
– Output should be usable in its presented form
– Outputs should be balanced on the ‘page’
– Users need to navigate easily and freely, or exit
– Outputs shouldn’t have jargon, error messages
INFO 503
Lecture #6
41
Design Considerations
• Timing of output is very important; a
beautiful report a week late is worthless
• Distribution of outputs must be complete
enough to reach all relevant (affected) users
• Outputs must contain all information
needed by its user - even if its requirements
forgot something!
INFO 503
Lecture #6
42
Output Design
• There are four steps to designing outputs
1. Identify system outputs
2. Select physical output requirements
3. If needed, design preprinted external forms
4. Design, validate, and test the outputs
(which includes obtaining feedback and
refining the design)
INFO 503
Lecture #6
43
1. Identify System Outputs
• Outputs should have been defined in the
requirements for each design unit – in the
DFD, look for data which leaves the system
• Content for each output should be defined
in the data dictionary
• Consider the audience for each output, and
when it will be generated (periodically
and/or on an event-driven basis)
INFO 503
Lecture #6
44
2. Select Physical
Output Requirements
• Choose how the output will be presented
based on its type and purpose, and on the
feasibility limitations
• Pick the medium (paper, screen, kiosk, etc.)
– Choose the output format (8.5 x 11” paper,
800 x 600 pixel screen, etc.)
INFO 503
Lecture #6
45
2. Select Physical
Output Requirements
• How often is this output generated?
– Does its generation need to be scheduled?
– Review the frequency the output will be needed
for the output volume requirements (amount of
paper, number of forms)
– Is duplication needed? (Xerox, burn CDs)
• How will the output be distributed?
How is that distribution controlled?
INFO 503
Lecture #6
46
3. Design Preprinted Forms (opt.)
• Preprinted forms may need long lead time
– What information will be preprinted?
•
•
•
•
Will the form be mailed?
What size will the form be? Perforated?
How much volume of printing is expected?
What legends and instructions need to be on
the form? What colors will be needed?
INFO 503
Lecture #6
47
4. Design, Validate,
and Test the Outputs
• Key tabular report concerns
–
–
–
–
–
–
INFO 503
Page size, orientation, and headings
Report legend; column & row headings
Alignment and spacing of labels and headings
Data formatting, including field masks
Control breaks, make rows of data into sections
Clearly define the end of report
Lecture #6
48
4. Design, Validate,
and Test the Outputs
• Key screen (and web) report concerns
–
–
–
–
–
–
–
INFO 503
Size (screen resolution) and page size (kB)
Scrolling; can headings be frozen?
Navigation scroll bars, tabs, or buttons
Partitioning (zoning of data within a form)
Information Hiding; how do users get more info
Highlighting; how & when get user’s attention?
Printing; always provide an option to do so
Lecture #6
49