A gentle Introduction to Computer Programming

Download Report

Transcript A gentle Introduction to Computer Programming

Databases
INTD 388 – Spring 2008
by A.Guercio and E. Walker
Definitions
• A database is a collection of related
data
• A database can be of any size and of
varying complexity
• A database management system
(DBMS) is a collection of programs
that enables users to define, create
and maintain a database
Database
Nomenclature
Data
database,
Data Warehouse:
Repository: A
A central
database
frequently
that can
provide
used as anlarge,
information
storage
Data
Mart:
searchable
database
authenticated
with access
facility,
withAresearchers
minimal
analysis
system
according
to the
to
of organized
an institution's
information.
orall
querying
functionality.
user's
has
that
focus on
It
is populated
of
datadata
from
a variety
Used
toneeds.
mirrorIt
existing
database
specific
to a particular
of
sources.
fornon-compatible
concurrent
access
to data.
research project
Data Management
Repurposing: When data
are archived, available
data can be used for
different purposes.
Ex. Linear sequences
captured to discover new
genes
are commonly
Use: Many,
including simulation,
repurposed
to
support
3D
statistical analysis,
visualization,
visualization
of Of
protein
communication.
course private data
structures
(i.e. not releasable) should be
Data Life
Cycle
protected at this stage.
Data Dictionary: Defines data
format standards.
Ex. Date of Birth: MM/DD/YY
is tranformed to the standard
Data of Birth: DD/MM/YY
Creation & Acquisition:
collection and insertion
of data in the system
Archiving: Making
data available for
future use.
Issues: Space
Requirements,
Completeness of
Data, Longevity,
Security, Archival
Process, …
Data
Life
Cycle
Database Architecture
Centralized:
Concentrates all
organizational
activity in one
location.
Disadvantages:
When the central
system crashes all
the data are not
available.
Distributed: Supports a
variety of hardware and
software and uses physically
disparate storage media,
each computationally
independent
Disadvantages: We need to
develop interfaces between
databases (n DB require 2n-1
interfaces)
What is a Database
System?
• Computerized System for the
management of records.
– Example: Oracle, Sybase, SQL Server,
DB2…
• The user of the system can perform
various operations:
•
•
•
•
•
add a new file to the database;
insert new data in an existing file;
search for data;
update data;
remove data; remove file.
Advantages of Database
Systems
•
•
•
•
•
•
•
Redundancy Control
Data Consistency and Integrity
Data Sharing and Integration
Security
Improved Maintenance
Concurrency (without data loss)
Backup & Recovery services
Disadvantages…
•
•
•
•
•
•
Complexity
Size
Cost (Hardware & Software)
Performance
Conversion
Risk of failure
Example
Bin
Wine
Producer
Year
Bottles
Ready
2
Chardonnay Buena Vista
88
1
91
3
Chardonnay Louis Martini 89
5
90
21
Fume Blanc
Robt.
Mondavi
88
4
91
22
Fume Blanc
Mirassou
87
2
90
72
Gamay
Robt.
Mondavi
85
2
90
Search
SELECT WINE, BIN, PRODUCER
FROM CELLAR WHERE
READY=91;
Result
Wine
Bin
Producer
Chardonnay
2
Buena Vista
Fume Blanc
21
Robt. Mondavi
Insertion
INSERT INTO CELLAR
VALUES (S3, ‘PINOT NOIR’,
‘SAINTSBURY’, 87, 1, 93);
Update
UPDATE CELLAR
SET BOTTLES=4
WHERE BIN=3;
Deletion
DELETE
FROM CELLAR
WHERE BIN=2;
Data Models
The Relational Data Model
• The relational model was introduced by E.F. Codd in
1970
• The model is based on the mathematical structure
known as the relation
– Informally, a relation resembles a table
– More precisely, a relation is a set of n-tuples r = {t1, t2, …,
tm} (i.e. a set of rows) and there is no order in the rows or
in the tuples.Therefore in order to distinguish one row
from another you need a unique key
• Ex. of a customer record:
S.S.
First_Name
Last_Name
Address
– S.S. is the unique key that distinguishes each
customer record
Parts of a Relational
Database Schema
Table
(relation)
Column*
(attribute)
name
address
age
Row
(tuple)
* Values of an attribute are chosen from its domain,
e.g. the domain of “age” is integers > 0
Relational Model Constraints
• No two tuples are equal, thus some set of
attributes must be unique, they are the key
– If there are multiple keys in a schema, they are
called candidate keys
– One of the candidate keys may be designated as a
primary key and be used to identify tuples in the
relation
– A foreign key is an attribute in a relation which is
the primary key of another relation
Ex.
Customer(S.S., First_Name, Last_Name, Address)
Checking Account(Account_Number, S.S., Balance)
Relational Model Constraints
• The entity integrity constraint states that
no primary key value can be null
– We don’t store descriptions of entities we can’t
identify
• Ex. We do not allow to insert a customer without
inserting the S.S.
• The referential integrity constraint is used
to maintain consistency among the tuples of
two relations
• Ex. We do not insert the checking account of a customer
for whom there is no data in the Customer table
• The integrity constraints may be displayed
diagrammatically by drawing an arrow from the relation
with the foreign key to the relation with the primary
key
What is good in relational DB?
• Advantage: We have data organized
efficiently and in a way that saves
space.
– Ex. We do need to repeat all the
information about the customer who has
500 accounts in our bank!
– Flat files with their independent tables
would not allow that.
Flat File (One Table)
Name
Class
Location
Minnie
Mouse
English 101 BC
H. Higgins
Minnie
Mouse
Minnie
Mouse
Psych 240
Lucy van Pelt
Accounting Hins. 209
110
Scrooge
McD
Goofy
Psych 240
Lucy van Pelt
Goofy
English 101 BC
Gerst. 120
Gerst. 120
Professor
H. Higgins
Relational Database Has
Multiple Tables
Schedule
Class
Enrollment
Prof
Name
Class
English 101 BC
H. Higgins
Minnie
Mouse
English 101
Psych 240
Lucy van
Pelt
Minnie
Mouse
Psych 240
Scrooge
McD
Minnie
Mouse
Accounting
110
Goofy
Psych 240
Goofy
English 101
Location
Gerst.
120
Accounting Hins.
110
209
SQL A Relational DB Language
• SQL (Structured Query Language) was
developed at IBM as part of an experimental
relational system called System R
• Along with QUEL (based on the relational
calculus), SQL is one of the earliest DB
languages
• SQL includes statements to define,
manipulate and interrogate the DB
• Ex: create table Student (
Name char[20],
GradYear integer,
)
Basic SQL Select
statement
• SELECT attribute
• FROM table(s)
• WHERE conditions
• Examples
– Select Class from Enrollment where Name = ‘Minnie
Mouse’
– Select Class from Schedule where Location = ‘BC’
More Complex Queries
• Multiple Tables joined together
• More conditions in where, joined by and, or, not
– Select Name, Class, Location
from Schedule, Enrollment
where Name = Goofy and
Schedule.Class = Enrollment.Class
– Select Prof, Name
from Schedule, Enrollment
where Prof = ‘Scrooge McD’ and
Schedule.Class = Enrollment.Class
More WHERE conditions
SELECT staffNO, fName, lName, salary
FROM Staff
WHERE…
…salary BETWEEN 20,000 and 30,000
…position IN {‘Manager’, ‘Supervisor’}
…fName LIKE {‘Al%n’} Allen or Alan or Allison …
…lName LIKE {‘_ _ _ _’} all 4-letter names
…mName IS NOT NULL has a middle name
Sorting the Results
• List all rooms, sorted by price (most expensive
first)
SELECT hotelName, roomNo, roomType, price
FROM Hotel, Room
WHERE Hotel.hotelNo = Room.hotelNo
ORDER BY price DESC;
• Multiple sort keys can be specified, e.g.
…ORDER BY lName, fName ASC;
Databases in
Bioinformatics
• NCBI
–
–
–
–
GenBank (sequences)
PubMed (Literature)
Human Genome
Etc!