What is a “database”? - Regis University: Academic Web Server for

Download Report

Transcript What is a “database”? - Regis University: Academic Web Server for

File and
Databases
CS208
File Organization

The three principal file organizations are
 Sequential
 Direct
 Indexed Sequential
Sequential File Organization


Records physically stored one after another in
order determined by key field
Advantages:



Very efficient when many or all records in a file
need to be accessed
Cost of tape and drives very low
Disadvantages:


Major drawback is very slow access time for a
particular record.
Must rewrite all records following record insertion.
Direct File Organization

Records stored at a specific address,
determined by their key field


Advantages:


A mathematical technique called hashing converts
the key field value into a corresponding address
Record can be quickly accessed by going directly
to its address
Disadvantages:


Must be done using random access storage
(disk/optical) which have higher cost than
sequential
Can only use one key
Indexed Sequential
File Organization

Compromise between direct and sequential
methods



Advantages:



Records stored sequentially
Index created that records the address of each
record
Good compromise between previous two methods
Can have multiple index tables to use multiple
keys
Disadvantages:

Slower than direct-access
What is a “database”?


A “database” is just a collection of related
data.
Databases can exist in many forms.
Examples:






Sheets of paper in folders in a file cabinet
A book (think of it as a collection of sentences and
illustrations)
Books in a collection (e.g., a library)
Sets of 3"x5" cards containing notes
Maps and other geographic information systems
Blood samples in a medical laboratory
Common Elements

Sets of data and information composed
of, and/or represented by:






bits
alphanumeric symbols
lines and shapes in drawings, pictures, and maps
audio and video recordings
actual substances
A means by which the sets of data and
information are organized in order to
facilitate access to individual desired sets
Examples of
Access Methods

Phone book - Collection of several independent
databases, each consisting of names and
corresponding phone numbers:

Blue-pages governmental listings:



primary arrangement alphabetical by type of
government (city, county, state, federal),
secondary arrangement alphabetical by agency
within type of government,
tertiary arrangement alphabetical by office
within agency
Examples of
Access Methods (continued)


White-pages personal listings:

arranged alphabetically by surname

within surname by first names
Yellow-pages listings:



primary arrangement by type of business,
secondary arrangement alphabetically by
company within type of business,
plus various special groupings (e.g., restaurants
by ethnic type)
Flat File Databases (DBs)

Flat-file DBs are like the DBs you can
construct in a single spreadsheet page


All the information in the DB is in one file
consisting of one array of rows and
columns.
For example:
SSN
123-45-6789
987-65-4321
567-89-0123
Surname
Doe
Smith
Johnson
First Name(s)
John X.
Martin
Billy Bob
Telephone Number
303-555-1234
720-111-2222
303-444-5555
Flat File
Advantages/Disadvantages

Advantages



Simple
Good for few records with few fields
Disadvantages




Unnecessary duplication of data or data
redundancy
Inconsistent, incomplete or inaccurate data,
lacking data integrity
Changes in data are difficult to implement
Separate and isolated data with limited data
sharing
Data Heirarchcy in
Computerized Databases

Character - single letter, number or
special character

Field - a set of related characters

Record - collection of related fields

File - collection of related records

Database - collection of related files
Types of Databases


Individual - often on a PC used by one
person
Company or shared - usually on a
mainframe and managed by a database
administrator

Example: Common operational databases
contain information about company
operations
Types of Databases


Distributed databases - have data stored
in multiple locations, but the data is
accessible through communications networks
Proprietary databases - created by an
organization, and stored information is
offered to others for a fee

Examples include Dialog Information Services and
Dow Jones Interactive Publishing
Database Terminology
Logical data view –
How humans see things
Physical data view –
How things are stored in a computer
Database Management System (DBMS) –
Allows a user to deal with data in logical
terms, without having to understand the
computer's physical view.
DataBase Management
System (DBMS)
DBMS – provides storage, retrieval, analysis,
sorting, and printing of information in a database.
DBMS is:
• A collection of program independent, interrelated
data
• A set of programs to access the data
• Information about a particular enterprise
• An environment that is both convenient and
efficient to use.
Current DBMS Systems



Mainframe database vendors:
 Oracle
 IBM DB2
 Microsoft (SQL Server)
 Sybase
 Informix
Desktop:
 MS Access
 Borland Paradox
Some free database systems (Unix) :
 Postgres
 MySQL, mSQL
 Predator
DBMS Uses
The processing power of a DBMS allows it to:
 Sort
 Match
 Link
 Aggregate
 Skip fields
 Calculate
 Arrange
Database Administrator (DBA)


Coordinates all the activities of the
database system.
Must have a good understanding of the
enterprise’s information resources and
needs.
Database Administrator (DBA)

Database administrator's duties include:
 Database definition
 Database modification
 Granting user authority to access the
database for security and privacy
 Specifying integrity constraints
 Acting as liaison with users
 Monitoring DB performance and
responding to changes in requirements
 Decides strategy for backup and recovery
Types of DBMS Users

End-User:




Non-specialist accessing data via a query language
Naïve user accessing data via a special-purpose
interface
Performs data retrieval and update (extend/modify)
Applications Programmer:


Writes programs that use the DB by embedding
queries to the DB in a HLL
Develops interfaces for the naïve user
DBMS Organization

The four principal DBMS organizations are
Hierarchical
 Network
 Relational
 Object-Oriented

Hierarchical Databases

Viewed as branches of an upside-down tree



Each item is subordinate to its parent item
Only one parent per item
Any element (node) in the database is linked
only to the elements directly above it and
directly below it.



If parent node is deleted, all the child nodes
are as well
New parent node must be created before
adding a new child node
No direct relationships between child nodes
Hierarchical Databases



Limited by rigid structure
Typically require custom programming
Example:


Original computer-based databases were designed for
banking.
Hierarchical databases were appropriate for such
purposes, e.g.:
 individual accounts can be grouped by family or
business;
 sets of accounts, grouped by branch;
 accounts in different branches, grouped by city;
 accounts in different cities, grouped by state.
Hierarchical Model
Department
DeptNum
Courses
CourseNum
Name
Students
Name
Section
IDNum
Name
Course
Professors
IDNum
Name
- Must always begin at top to
search for data
- One parent per child, no other
relationships allowed
Network Databases




Permits links among all components (i.e.
elements can be linked to other elements
anywhere in the database, not just those
directly above and below)
The interconnected design allows for access
via multiple pathways
Can be extremely difficult to manage
The World-Wide Web is a very large example
of a network database.
Network Model
Department
DeptNum
Students
IDNum
Name
Professors
Courses
CourseNum
Name
Name
Section
IDNum
Name
- Added data paths across the tree
(instead of just up and down)
- Reduced time required to access data,
but increased overhead space requirements
Course
Relational Databases


A relational database is a set of one or more
tables that together embody information
about a set of related concepts and entities.
The tables are connected (related) via fields
within the table that are shared by a pair of
tables.
Relational Model
Department
DeptNum
…
…
…
Professors
Name
…
…
…
IDNum
…
…
…
Courses
Num
…
…
…
Name
…
…
…
DeptNum
…
…
…
Course
…
…
…
Students
ProfID
…
…
…
Section
…
…
…
IDNum
…
…
…
Name
…
…
…
Course
…
…
…
Based on tables of objects (the data),
rather than specific paths (ways to access the data)
Relational Database Rules



Each row is unique (distinct)
Each column name is unique within a table
It is permissible to have the same name for a
column in two different tables in the same
database.
 To distinguish between them we use a
qualified name:
Pet.Name vs Family.Name
Relational Databases


In a Relational DB, the information
content of a table does not depend on
either

The order of the rows; or

The order of the columns
In other words, the rows and columns of
a table can be rearranged at will without
affecting the table's information content
Relational DB Terminology
Fields
(Columns, Attributes)
Num
CS208
MT360
CS320
Name
CS Fundamentals
Calculus
C Programming
Credits
3
4
3
Records
(Rows, Tuples)
Courses
File (Table) = Relation
Rows (Tuples) = Records
Columns (Attributes) = Fields
Relationships between
objects are defined by
common attributes
Primary Keys

In a Relational DB, each table



Must have a primary key (unique identifier)
Must have no duplicate rows
A primary key is



A data attribute (column), or a combination of
attributes, that uniquely identifies each record
in the table.
A simple key consists of a single attribute
A composite key consists of two or more
attributes
Primary Keys (continued)

Primary Key



Provides unique way to identify each record
Can be obvious from the structure of the
table. If there is no easy natural choice, you
can add a column containing a unique
identifier.
May consist of the entire record (especially
with two-column tables, which occur often in
the development of RDBs)
Other Types of Keys

Secondary Key



A column that is used to aid in the retrieval of
information from a table.
A secondary key is not required to have unique
values in each of its rows,
Foreign Key

A column used to retrieve information from one
table (i.e., is a secondary key) that is also the
primary key in another table. Foreign keys are
a major tool in Relational DBs.
Relational Database Design

Design Goals:



Avoid redundant data
Ensure that relationships among
attributes are represented
Facilitate checking updates for violation of
database integrity constraints
Creating a Student Database










Student ID
First Name
Middle Initial
Last Name
Home Address
School Address
Street Address
City
State
Zip






Home Phone
Work Phone
Cell Phone
Course ID
Course Name
Course Instructor
Database Normalization
Optimize the Tables by:



Storing each piece of data once and only
once (i.e. Eliminate redundant data).
Ensuring data dependencies make sense
(only storing related data within a
particular table).
Maintaining data integrity.
First Normal Form



Eliminate repeating groups in individual
tables.
Create a separate table for each set of
related data.
Identify each set of related data with a
primary key.
1st Normal Student Database
Student
Student ID
First Name
Middle Initial
Last Name
Phone
Student ID
Phone Type
Phone Number
StudentCourse
Student ID
Course ID
Course Name
Course Instructor
Address
Student ID
Address Type
Street Address
City
State
Zip
Second Normal Form

Create separate tables for sets of values that
apply to multiple records


i.e. Remove partial data dependencies.
Relate these tables using a foreign key.
2nd Normal Student Database
Student
Student ID
First Name
Middle Initial
Last Name
Phone
Student ID
Phone Type
Phone Number
StudentCourse
Student ID
Course ID
Course Name
Course Instructor
Address
Student ID
Address Type
Street Address
Zip
ZipCode
Zip
City
State
Third Normal Form

Eliminate any fields that do not depend
on the key.
3rd Normal Student Database
Student
Student ID
First Name
Middle Initial
Last Name
StudentCourse
Student ID
Course ID
Phone
Student ID
Phone Type
Phone Number
Course
Course ID
Course Name
Course Instructor
Address
Student ID
Address Type
Street Address
Zip
ZipCode
Zip
City
State
Relational Design Summary

Data is stored in records, inside of tables

Primary keys uniquely identify a record

Foreign keys link data in one table to the
primary key in another table

Designs should maintain data integrity

Normalization concepts should be used
Dis/Advantages of
Relational Datebases

Disadvantages


Require more overhead
Advantages




Cut down on needless repetition of information
Ensure more accuracy
Facilitate updating and deletion of information.
Design avoids errors that occur when
adding/deleting information from flat files
Query Languages
Query languages allow non-programmers to
question the DBMS
Structured Query Language (SQL) –
the only standard structured query language
Structured Query Language
(SQL)





Pronounced either "S, Q, L" or "sequel"
Widely used standard set of commands and
syntax for doing things with Relational DBMSs
Used especially for query and retrieval
Includes commands for defining Relational DBs,
conducting transactions, storing data, etc.
Each Relational DBMS also has additional features
unique to it, because SQL does not handle all the
practical details involved in using a Relational DB
Select


The select clause directs the DBMS to choose
a subset of fields from one or more tables.
For example,
SELECT last_name, first_name
chooses all last names and first names and
will place them in this order in the result
set.
From clause



The from clause directs the query toward one or
more tables.
 from Student (directs the query to use the
student table)
Used with the SELECT clause:
SELECT last_name, first_name
FROM Student
chooses the last and first name of all rows in
Student table.
Note: As long as the column names within the tables
used in the from clause are unique, qualified names are
not required.
Where clause

A where clause specifies a selection criterion
that we will use to limit our choice of records.

A where clause follows the name of the table.

Format:
SELECT field1,…, fieldN
FROM tablename
WHERE (boolean expression)
Boolean Expressions


Boolean expressions used in where clauses
typically involve a field compared to another
field or a field compared to a value.
For example:
WHERE GPA > 3.50
WHERE last_name > “M”
WHERE amount_owed >= amount_paid
Comparison Operators
=
is equal to
>
is greater than
<
is less than
>=
is greater than or equal to
<=
is less than or equal to
not
is not equal to
Simple SQL Query
Textbooks
Title
Price
Category
Publisher
Java Intro
109.99
Computers
Prentice Hall
Calculus
129.99
Math
Kaufman
Advanced C
115.99
Computers
Tech Inc
Philosophy
83.99
Lib Arts
Prentice Hall
SELECT *
FROM Textbooks
WHERE category=‘Computers’
“selection”
Title
Price
Category
Publisher
Java Intro
109.99
Computers
Prentice Hall
Advanced C
115.99
Computers
Tech Inc
Simple SQL Query
Textbooks
Title
Price
Category
Publisher
Java Intro
109.99
Computers
Prentice Hall
Calculus
129.99
Math
Kaufman
Advanced C
115.99
Computers
Tech Inc
Philosophy
83.99
Lib Arts
Prentice Hall
SELECT Title, Price, Publisher
FROM Textbooks
WHERE Price < 110
“selection” and
“projection”
Title
Price
Publisher
Java Intro
109.99
Prentice Hall
Philosophy
83.99
Prentice Hall
Joins


You can use more than one table in a query.
The join clause tells the DBMS that you are
using two tables.

You must also specify the fields on which
you are linking the files, using the ON
clause.
Example
SELECT Textbooks.Title, Price, Required
FROM Textbooks, CourseTexts
ON Textbooks.Title = CourseTexts.Title
WHERE Category = ‘Computers’

The dot (.) notation specifies the table then
the field within the table to reconcile the
ambiguity with the two names.
Join SQL Query
Textbooks
Title
Price
Category
Publisher
Java Intro
109.99
Computers
Prentice Hall
CourseTexts
Calculus
129.99
Math
Kaufman
Title
Course
Required
Adv C
115.99
Computers
Tech Inc
Java Intro
CS434
Yes
Philosophy
83.99
Lib Arts
Prentice Hall
Adv C
CS422
No
SELECT Textbooks.Title, Price, Required
FROM Textbooks, CourseTexts
ON Textbooks.Title = CourseTexts.Title
WHERE Category = ‘Computers’
Title
Price
Required
Java Intro
109.99
Yes
Adv C
115.99
No
Database Concerns
 Privacy/Unauthorized Access to Data
 Data is easier to gather and exploit using a
computer, so precautions must be taken to
guard the data.
 Accuracy/Completeness
 Owner of database must ensure accuracy
of the data.
 Users must take data with grain of salt must verify