LIS 397.1 Introduction to Research in Library and

Download Report

Transcript LIS 397.1 Introduction to Research in Library and

LIS 384K.11
Database-Management
Principles and Applications
Introduction
R. E. Wyllys
Copyright © 2002 by R. E. Wyllys
Last revised 2002 Jan 28
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Course Objectives: To Develop an
• Understanding of the nature of database-management systems
(DBMSs), including their structure, design, and evaluation
• Understanding of the relationship between DBMSs and the
analysis of information systems in libraries and in business
• Understanding of the distinctions among flat-file databases
(DBs), network DBs, hierarchical DBs, relational DBs, and textoriented DBs
• Understanding of the process of normalization of relational DBs
• Understanding of the role of the Structured Query Language
(SQL) standards in the current and future development of
DBMSs
• Understanding of management and social issues such as
database security and privacy
• Introductory level of skill in the use of a microcomputer
database-management system (Microsoft Access)
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Examples of Databases
• What does the word "database" mean?
– Nowadays we usually think it means a computer-stored set
of information
– However, databases can exist in many forms. Examples:
• Electronic data: text, visual images, audio images,
numbers
• Sheets of paper in folders in a vertical file
• A book (think of it as a collection of sentences and
illustrations)
• Books in a collection (e.g., a library)
• Sets of 3"x5" cards containing notes
• Blueprints
• Maps
• Core samples from oil wells
• Blood samples in a medical laboratory
• DNA samples in a forensic laboratory
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Examples of Databases
• What do these examples have in common?
– Sets of data and information composed of, and/or
represented by: bits; or alphanumeric symbols; or
lines and shapes in drawings, pictures, and maps;
or audio recordings; or video recordings; or realia
(i.e., actual substances)
– At least one means by which the sets of data and
information are organized in order to facilitate
access to individual desired sets
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Examples of Databases
• Consider the provisions for access to individual
pieces of information in the following examples:
– Phone book. Contains a collection of several
independent (discrete) databases, each consisting of
names together with corresponding phone numbers:
• White-pages personal listings, arranged alphabetically by
surname and within surname by first names
• White-pages corporate listings, arranged alphabetically
• Blue-pages governmental listings: primary arrangement
alphabetical by type of government (city, county, state, federal),
secondary arrangement alphabetical by agency within type of
government, tertiary arrangement alphabetical by office within
agency
• Yellow-pages listings: primary arrangement by type of
business, secondary arrangement alphabetically by company
within type of business, plus various special groupings (e.g.,
restaurants by ethnic type)
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Examples of Databases
• Provisions for access to individual pieces of
information, cont'd:
– Organizational membership directory, usually
consisting of names together with corresponding postal
addresses, telephone numbers, and email addresses
• Typically contains listings by surname and first names, plus
groupings by regions (e.g., states, countries) and by
membership in special interest groups (SIGs), arranged
alphabetically by name within regional groups and SIGs
– Dictionary, consisting of words with corresponding
definitions, and in some cases, lists of synonyms
and/or antonyms
• Primary collection is individual words arranged alphabetically
• May contain separate sections (e.g., geographical names,
biographical names, abbreviations, proofreaders' marks)
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Examples of Databases
• Provisions for access to individual pieces of
information, cont'd:
– Thesaurus
• Primary arrangement by broad concepts (themes), with
subgroupings of sets of closely related words (often arranged by
type of speech, e.g., nouns, verbs), each set sharing a
subconcept of the primary concept; sometimes includes
antonyms of the primary concept and/or selected subconcepts
– Book (non-fiction)
• Table of contents
– Provides access to chapters (and sometimes to subchapters)
dealing with broad topics that are aspects of the overall subject(s) of
the book
• Index
– Organizes narrow concepts by names, terms, subterms, etc.
– Provides pointers from terms to relevant locations in text of
book
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Examples of Databases
• The foregoing examples illustrate some ways
of organizing information in DBs, whether
computerized or non-computerized: viz.,
– An intrinsic index provides information organized by
and with the entry or record (e.g., a Rolodex card); or
– A separate index can point to the location of the
information (e.g., a book index, or a library catalog); or
– Records (i.e., basic packages of information) can
contain retrieval tags (access tags, labels, etc.) that
identify them and that can be searched for (e.g.,
labelled folders in a file); or
– (Worst Case) Records can be sought via exhaustive
search (by humans or computer programs)
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Computerized Databases
• Consist of
– Bits, organized into bytes, which in turn are
organized into sequences or strings of bytes
– Fields: sets of bytes that represent information
– Records: sets of fields that are associated by
sharing relevance to some entity
– Files: sets of records sharing relevance to a
particular type of entity
• Databases typically consist of one or more
sets of related files
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Computerized Databases
• Of special interest are Relational
Databases (RDBs) and programs that
manage them, known as Relational
Database Management Systems
(RDBMSs).
– Note: The word "relational" is often omitted
nowadays, since most well known DBMSs (e.g.,
IBM DB2, Informix, MS Access, MS SQL Server,
Oracle, Sybase) are RDBMSs.
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Introduction to RDBs
• In discussing relational databases, we
use synonymously the words
– File, table, relation
– Record and row
– Field, column, attribute
• Note: Discussions of RDB theory tend to prefer the words
italicized above
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Introduction to RDBs
• Definition:
– A relational database is a set of one or more
tables that together embody information about a
set of related concepts and entities.
– If (as is usually the case) a relational database
has more than one table, the tables are connected
(related) in the following way:
• It is possible to move from any one table in the RDB to
any other table in the RDB via a chain of columns (i.e.,
fields, attributes) shared in pairwise fashion by
successive tables.
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Introduction to RDBs
• The picture below shows 3 tables, with a total of
12 attributes (i.e., 12 distinct columns). The top
and middle tables share Attribute 3; the middle and
bottom tables share Attribute 7.
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Introduction to RDBs
• Definition:
– A database application is a combination of
• A relational database-management system
(RDBMS)
• A relational database (RDB)
• Associated menus, data-entry forms, and report
forms
• Documentation (e.g., manuals) for the users.
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Introduction to RDBs
• An application is a package designed to
facilitate a particular real-world function (or a
set of related functions): e.g., looking up
books in a library catalog, or handling a sales
transaction in a store.
– Note: An application may include more than one
RDB, and/or it may include a "stray" table or two,
so long as such additions serve the basic function
and make the whole package more convenient for
humans to use.
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Various Types of DBs
• Types of Databases
– Flat file (spreadsheet)
– Hierarchical
– Network
– Relational
– Text-oriented
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Flat File DBs
• Flat file DBs are like the DBs you can
construct in a spreadsheet, i.e., all the
information in the DB is in one file
consisting of one array of rows and
columns.
SSN
123-45-6789
987-65-4321
567-89-0123
Surname
Doe
Fulano
Roe
First Name(s)
Jane Q.
Juan
Richard Rodney
Telephone Number
512-555-1234
210-543-9876
512-987-6431
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Flat File DBs
• Flat file databases (spreadsheet style)
– Advantages
• Simple
• Suitable for small numbers of records with few
attributes
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Flat File DBs
• Flat file databases (spreadsheet style)
– Disadvantages
• Likely to include repetitions of data
• Multi-valued attributes (e.g., multiple authors, multiple phone
numbers) require repetitions of accompanying data
• Changes in data are difficult to implement
• Deletion and insertion anomalies are common
• Often lead to too much information in one table
SSN
123-45-6789
987-65-4321
987-65-4321
567-89-0123
Surname
Doe
Fulano
Fulano
Roe
First Name(s)
Jane Q.
Juan
Juan
Richard Rodney
Telephone Number
512-555-1234
210-543-9876
512-234-5678
512-987-6431
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Hierarchical DBs
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Hierarchical DBs
– Hierarchical databases
• Based on a classification scheme (a taxonomy)
• First databases were designed for banking.
Hierarchical databases were appropriate for
such purpose
• Typically require custom programming
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Network DBs
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Network DBs
– Network databases
• Can be extremely complex and difficult to
manage
• World-Wide Web is a very large example of a
network database
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Text-Oriented DBs
• Text-oriented DBs are, as their name suggests,
DBs that have special features for handling text:
e.g., abilities
– To search for specified strings of characters
• With or without matching the cases of the characters
• While using wildcards, i.e., symbols that will match any one
character or any sequence of characters
• With or without automatic inclusion of word variants, e.g.,
plurals, "ing" verb endings
– To search on pairs, triples, etc., of words and phrases,
using
• Boolean logic
• Proximity logic (e.g. both words must be in same sentence, or
in same paragraph, or in same section, or within n words of
each other)
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Text-Oriented DBs
• Examples of abilities of text-oriented DBs, cont'd
– To rank search results by weights assigned to the terms
used in the search
– To maintain thesauri of near synonyms and to allow
searches by near synonyms of original query terms
– To maintain, for selected words or phrases, indexes of
their locations in files
• Commercial text-oriented DBs exist (e.g., LexisNexis and Dialog), running on large computer
systems.
• The only text-oriented DBMSs for microcomputers
that I know of are askSam, DB/Textworks, Isys,
and STAR.
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Advantages of RDBs
• Advantages of relational databases
–
–
–
–
Cut down on needless repetition of information
Ensure more accuracy
Facilitate updating and deletion of information.
Design avoids problems that occur with flat
files, e.g., insertion and deletion anomalies
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Relational Databases
• In a RDB, the information content of a
table does not depend on either
– The order of the rows; or
– The order of the columns
• In other words, the rows and columns of a
table can be rearranged at will without
affecting the table's information content
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Relational Databases
• In a RDB, each table
– Must have a primary key (unique identifier)
– Must have no duplicate rows
• A primary key is
– A data attribute (column), or a combination of
attributes, that uniquely identifies each record in the
table.
– A simple key consists of a single attribute
– A composite key consists of two or more attributes
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Relational Databases
• Primary Key
– Provides unique way to identify each record
– Can be obvious from the structure of the table.
If there is no easy natural choice, you can add
a column containing a unique identifier.
– May consist of the entire record (especially
with two-column tables, which occur often in
the development of RDBs)
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications
Computerized Databases-They can
help to
save you
from this
kind of
work!
GSLIS - The University of Texas at Austin
LIS 384K.11, Database-Management Principles and Applications