Chapter 12: Files and Databases

Download Report

Transcript Chapter 12: Files and Databases

Files and Databases
Chapter 12
Organizing and Managing Digital Data
© The McGraw-Hill Companies, Inc., 2000
Overview
• Database and database
administrator
• Hierarchy
• File handling
• File management
systems
• DBMS
• Ethics
Ch 12
© The McGraw-Hill Companies, Inc., 2000
2 CCI
Databases
• Collection of related files
• Range in size from those on your PC to
terabytes of digital photographs of the
world on a large series of servers
– http://teraserver.microsoft.com
• Examples
– online services, virtual art museums, libraries
Ch 12
© The McGraw-Hill Companies, Inc., 2000
3 CCI
Database Administration
• Managing a database
• Database administrator (DBA)
– design, implementation,
integration]
– coordination with users
– system security
– backup and recovery
– performance monitoring
Ch 12
© The McGraw-Hill Companies, Inc., 2000
4 CCI
Hierarchy and Key Field
• Data storage hierarchy
– levels of data
• bits, bytes, fields, records, files, and databases
– definitions
• character, field, record, file, database
– key field
• unique data used to identify a record
• used for sorting
• often numerically generated
Ch 12
© The McGraw-Hill Companies, Inc., 2000
5 CCI
Basic Concepts
• Types of files
– program files
• software instructions
– data files
• data order and organization
should be logical and
consistent
Ch 12
© The McGraw-Hill Companies, Inc., 2000
6 CCI
Types of Data Files
• Master
– relatively permanent
records updated
periodically
– currently accurate
• Transaction
– temporary holding file
used for additions,
deletions, modifications
Ch 12
© The McGraw-Hill Companies, Inc., 2000
7 CCI
Batch vs. Online
• Batch
– collect data, then process all at once
– advantage
• very efficient processing, checking for data validity
occurs at the originating batch site
• Online
– real-time processing
– airlines reservation system booking prevents
duplicate reservations
Ch 12
© The McGraw-Hill Companies, Inc., 2000
8 CCI
Offline vs. Online
• Offline
– not directly accessible to the CPU such as tapes
or disks that need to be loaded
• Online
– storage is direct and fast
– generally disk
Ch 12
© The McGraw-Hill Companies, Inc., 2000
9 CCI
File Organization
• Sequential access storage
– stores one record after another
– alphabetic or numeric
• Direct access storage
– can access the data using direct methods such
as addressing
Ch 12
© The McGraw-Hill Companies, Inc., 2000
10 CCI
Organizing Methods
• Sequential file organization
– records can be retrieved in the sequence that
they were stored
– useful when large group needs to be accessed
most of the time
– catalog mailing
Ch 12
© The McGraw-Hill Companies, Inc., 2000
11 CCI
Organizing Methods (continued)
• Direct file organization
– random file organization
– records stored in no particular sequence
– hashing algorithm used to generate a unique
number to identify the record
– faster for finding a specific record
Ch 12
© The McGraw-Hill Companies, Inc., 2000
12 CCI
Organizing Methods
• Indexed-sequential file organization
–
–
–
–
–
–
Ch 12
or indexed file organization
files stored in sequential order
indexes records according to key field
requires magnetic or optical disk
slower overall than direct access
bank has up-to-date record information, but
prints sequentially (monthly statements)
© The McGraw-Hill Companies, Inc., 2000
13 CCI
File Management System
• Disadvantage of file
management systems
– data redundancy
– lack of integrity
– lack of program independence
Ch 12
© The McGraw-Hill Companies, Inc., 2000
14 CCI
Database Management Systems
• DBMS
• Controls the structure
of the database and
access to the data
Ch 12
© The McGraw-Hill Companies, Inc., 2000
15 CCI
Advantages of DBMS
•
•
•
•
•
Ch 12
Reduced data redundancy
Improved data integrity
More program independence
Increased user productivity
Increased security
© The McGraw-Hill Companies, Inc., 2000
16 CCI
Disadvantages of DBMS
• Cost issues
• Data vulnerability issues
• Privacy issues
Ch 12
© The McGraw-Hill Companies, Inc., 2000
17 CCI
Database Organization
•
•
•
•
Ch 12
Hierarchical
Network
Relational
Object-oriented
© The McGraw-Hill Companies, Inc., 2000
18 CCI
Hierarchical
• Grouped in related groups, or tree
• Lower level record called a child
• Parent record at the top of the tree is called a
root record
• In a hierarchical database, a parent may have
more than one child, but a child has only one
parent (a one-to-many relationship)
• Simple and fast
Ch 12
© The McGraw-Hill Companies, Inc., 2000
19 CCI
Network Database
• A type of hierarchical database, but children
can have more than one parent
• More flexible because can establish
relationships between differently parents
• Limits to the number of links
• Retains some of the speed of access of a
hierarchical database
Ch 12
© The McGraw-Hill Companies, Inc., 2000
20 CCI
Relational Database
• Relates data through a key field
• More flexible
• Advantage
– user does not have to be aware of structure
– easily add, modify, delete records
Ch 12
© The McGraw-Hill Companies, Inc., 2000
21 CCI
Relational Database
• Disadvantage
– can be time consuming
Ch 12
© The McGraw-Hill Companies, Inc., 2000
22 CCI
Object-Oriented Database
•
•
•
•
OODBMS
Numeric, text, graphics, audio
Important part of technology merge
Uses
–
–
–
–
Ch 12
medical information systems
engineering information systems
geographic databases
training and education
© The McGraw-Hill Companies, Inc., 2000
23 CCI
DBMS Features
• Data dictionary
– also called encyclopedia and repository
– stores data definitions
• Utilities
– assist in maintaining databases by filtering
acceptable data for input, editing data, and
monitoring
Ch 12
© The McGraw-Hill Companies, Inc., 2000
24 CCI
Query Language
• Data manipulation language
• Used to make database
queries that do not require
command language
• Most popular is SQL (“see
quill”), or Structured Query
Language
Ch 12
© The McGraw-Hill Companies, Inc., 2000
25 CCI
SQL
• Used in Oracle, Sybase, dBase, Paradox,
and Microsoft Access
• Some use a natural or spoken English
method of information gathering
Ch 12
© The McGraw-Hill Companies, Inc., 2000
26 CCI
Report Generator
• Produces onscreen or
printed
reports
• User can
customize
appearance
Ch 12
© The McGraw-Hill Companies, Inc., 2000
27 CCI
Access Security
• Can be tailored for group
access or individual
access
• Physical security is
equally as critical as data
security
Ch 12
© The McGraw-Hill Companies, Inc., 2000
28 CCI
System Recovery
• Recovery types
– full and partial
– match backup techniques
• Techniques
–
–
–
–
Ch 12
mirroring
reprocessing
rollforward
rollback
© The McGraw-Hill Companies, Inc., 2000
29 CCI
Types of Recovery
• Mirroring
– frequent simultaneous copying of data to two or
more places
• Reprocessing
– goes back to a point of database activity where
the database was correct and reprocesses data to
bring it up to date
Ch 12
© The McGraw-Hill Companies, Inc., 2000
30 CCI
Types of Recovery (continued)
• Rollforward: forward recovery
– recreates current database using a previously
stored database
– uses after-image records with processing
information
• Rollback: backward recovery
– undoes unwanted images, for example, if only
half a transaction was processed
Ch 12
© The McGraw-Hill Companies, Inc., 2000
31 CCI
Mining, Warehouses, “Siftware”
• Data mining
– DM, or knowledge discovery
– sifts through large database to
uncover trends and predict future
trends
– helps in marketing, health, and
science
Ch 12
© The McGraw-Hill Companies, Inc., 2000
32 CCI
Data Warehousing
• Requires data preparation
– identification of all data sources
– fuse data and clean or scrub data to ensure
accuracy
– metadata shows the origins of data, the
transformations, and summary data
• Data warehouse
– combination of cleaned data and metadata
– often uses massively parallel processing (MPP)
Ch 12
© The McGraw-Hill Companies, Inc., 2000
33 CCI
Siftware for Finding and
Analyzing
• Query-and-reporting tools
– Focus Reporter, Seagate Crystal Reports,
Esperant
– Specific questions to verify hypotheses
• Multidimensional-analysis tools
– MDA
– Essbase, Lightship
– data surfing to explore dimensions of subset
Ch 12
© The McGraw-Hill Companies, Inc., 2000
34 CCI
Siftware for Finding and
Analyzing...
• Intelligent agents
– roam networks and perform complex tasks
– DataEngine, Data, Logic
– Help turn up unexpected relationships and
patterns
• Data mining
– combines facts from all parts of a business
– cash registers, shipping documents, credit-card
files
Ch 12
© The McGraw-Hill Companies, Inc., 2000
35 CCI
Ethics of Using Databases
• Misinformation explosion
– data is found, but little effort
is made to insure that the
data is updated
– reliance on anecdotal
evidence
– causes inaccuracies that can
be harmful
Ch 12
© The McGraw-Hill Companies, Inc., 2000
36 CCI
Information Accuracy
• More facts, faster facts, but not
necessarily better facts
• Database is not necessarily
updated with current
information
• Computer sources not
necessarily accurate
Ch 12
© The McGraw-Hill Companies, Inc., 2000
37 CCI
Information Completeness
• Know the boundaries, as no information
service has it all
• Know the complete iterations of key words
• History is limited
– most databases go back only to 1980
– frequently assessment is unthinkingly extended
to years beyond 1980
Ch 12
© The McGraw-Hill Companies, Inc., 2000
38 CCI
Privacy Issues
• Right not to reveal
information about one’s
self
• Credit card, shopping
habits, harassment
• Fair Information
Practices
– U.S. Department of
Health, education, and
Welfare
Ch 12
© The McGraw-Hill Companies, Inc., 2000
39 CCI
Privacy Enactment
• Privacy Act of 1974
– limits government and their contractors
– right to see and correct inaccurate data about
one’s self
• Freedom of Information Act
– personal access to data gathered on self
• Computer Matching and Privacy Protection
Act
– prevents government from comparing some
records to other records of individuals
Ch 12
© The McGraw-Hill Companies, Inc., 2000
40 CCI
Finance Privacy
• Fair Credit Reporting Act of 1970
– access to and challenge credit records
– if denied credit, must be given free of
charge
• Right to Financial Privacy Act of
1978
– restrictions on federal agencies
wanting to search records in banks
Ch 12
© The McGraw-Hill Companies, Inc., 2000
41 CCI
Health Privacy
• No federal laws protect medical records in
the United States
– except drug and alcohol abuse and psychiatric
care
• A strategy is to decline to fill out medical
history or questionnaires unless clear need
for them
• Can always ask for a copy of your medical
records
Ch 12
© The McGraw-Hill Companies, Inc., 2000
42 CCI
Employment Privacy
• Nongovernmental employer least regulated
by privacy legislation
• Employers may verify
–
–
–
–
–
–
Ch 12
education
employment
credit
driving record
workers’ compensation claims
criminal record, if any
© The McGraw-Hill Companies, Inc., 2000
43 CCI
Commerce Issues
• Marketing gathers data about age, buying
habits, favorite charities
• No prohibition of gathering data for one
reason and using it for another
– except Video Privacy Protection Act of 1988
• prevents giving out records without a court order or
individual’s consent
Ch 12
© The McGraw-Hill Companies, Inc., 2000
44 CCI
Communications Privacy
• Some constraints in acquiring
and disseminating information,
listening, and encryption use
• Some argue that you must be
willing to give up some privacy
for safety and security
Ch 12
© The McGraw-Hill Companies, Inc., 2000
45 CCI