Transcript database

INTRO TO DATABASES
Part I
IS 340
BY
CHANDRA S. AMARAVADI
1
IN THIS PRESENTATION..
Data vs information
Database organization
file, record, attribute/field
DBMS activities and functions
DBMS components
Importance of databases
2
DATA VS INFORMATION
3
DEFINING DATA AND INFORMATION
Data: Data consists of raw facts e.g. 298-2034
Information: Data presented in a context e.g.
someone’s phone#
Some more examples:
a) IS340
a) IS340 is a core course for BBA.
b) Joan Smith
b) Joan Smith is an arts & sciences student
c) Score was 94-72
c) Score in the Celtics-Bulls game was 94-72.
4
DATA IS NOT
USUALLY ISOLATED
Example information about a person
Name:
DOB:
Profession:
Address:
Steve Nash
2-4-1965
Basket ball player
415 Walnut street.
These describe a person
more. They are called ????.
5
ANOTHER EXAMPLE
Example information about a credit card transaction:
Date:
Time:
Merchant:
Card type:
Issuer:
12/5/08
4:48 pm
WIU Union
Master Card
First Bank
These describe a transaction
more.
6
ONE ATTRIBUTE HAS A UNIQUE
VALUE
One of these is always unique:
Acct#:
4555950
Date:
12/5/08
Time:
4:48 pm
Merchant: WIU Union
Card type: Master Card
Issuer:
First Bank
The unique attribute is
called ???
7
CHARACTERISTICS OF DATA/
INFORMATION
WHAT CAN WE SAY ABOUT THE TWO
TYPES OF DATA?
Transaction#:
Date:
Time:
Merchant:
Card type:
Issuer:
55643
12/5/08
4:08 pm
WIU Union
Master Card
First Bank
Transaction#:
Date:
Time:
Merchant:
Card type:
Issuer:
55644
12/5/08
4:20 pm
Vitales
Master Card
Midwest Bank
8
CHARACTERISTICS OF DATA/
INFORMATION…
Transaction
ID
Transaction
Date
Transaction
Time
Merchant
55643
12/05/08
4:08 pm
WIU Union
55644
12/05/08
4:20 pm
Vitales
Data is typically organized into ??
Transaction#:
Date:
Time:
Merchant:
Card type:
Issuer:
55643
12/5/08
4:08 pm
WIU Union
Master Card
First Bank
9
HISTORICAL APPROACH TO DATA
55643
55644
55644
55646
12/05/08
12/05/08
12/05/08
01/07/09
4:08pm
4:20pm
4:20pm
8:05am
WIU Union
Vitales
Vitales
Pizza Hut
55643
55644
55644
55646
$25.60
$38.00
$63.00
$55.00
Transaction
ID
Transaction
Date
Transaction
Time
Merchant
55643
12/05/08
4:08 pm
WIU Union
55644
12/05/08
4:20 pm
Vitales
MasterCard
Visa
Visa
American Express
This is called ___________ __________.
10
FILE PROCESSING PROBLEMS
What problems did the file processing approach
result in?
11
DATABASE ORGANIZATION
12
DEFINITIONS


Database - Organized collection of information
DBMS - SW program to manage and provide
controlled access to the data
13
DBMS ORGANIZATION
Database – A group of related files
File/table – A group of related records
Record – a grouping of related fields
Field – a grouping of characters (one row)
Schema -- This is the logical view
of the database (tables and fields)
Primary key – An attribute whose values are
unique within a file
Database
Files/tables
Records
Fields
14
DBMS ORGANIZATION
Structure/schema
Primary key
Attributes/field names
PROD#
DESCR.
PRICE
QUANTITY
Record
IR888E
Refrigerator
$1,800
20
Record
TS3233
Television
$67
32
Attr. values
15
DBMS ACTIVITIES
16
DBMS ACTIVITIES & FUNCTIONS
Activities with DBMS
Define structure / schema
Enter data
modify data
query data
get reports
17
DBMS ACTIVITIES & FUNCTIONS
Define structure/schema
Field Name
Data type
Description
Length
Decimals
Prod#
Numeric
Unique prod
code
6
0
Descr
Text
Short prod
description
25
0
Price
Currency
Product price
6
2
18
DBMS ACTIVITIES & FUNCTIONS
Enter data -- Create a data entry form
Heading
PRODUCT DATA ENTRY
Label
Product #:
Description:
Data entry
form
Field
19
DBMS ACTIVITIES & FUNCTIONS
Modify data
 Add
 Delete
Records/field values
 Change
20
DBMS ACTIVITIES & FUNCTIONS
Query data
list products costing more than $2,000
Prod#
Descr.
Price
> 2,000
Quantity
Query by example form
21
RH
PRODUCT LISTING
Title
PH
Column
Headings
PRODUCT # DESCR. PRICE
Detail
Product #
Descr.
Price
Fields
PF
Average Price
RF
Footer
A report specification
22
GETTING DATA OUT..
PRODUCT LISTING
PROD#
M100
M150
DESCRIPTION
Chair
Table
Average Price
PRICE
$ 50.00
$200.00
$153.00
A generated report
23

Activities with DBMS
◦
◦
◦
◦
◦
Define structure / schema
Enter data
modify data
query data
get reports
 DBMS
components
24
Major Components of DBMS
D B M S Kernel
Data
Defn.
SQL
Prog.
Language
Interface
Data
Dictionary
Screen/
Report
Gen.
Appln.
Gen.
D B M S Kernel
Export/Import
DBMS COMPONENTS..
Data definition – the facility through which schema is defined.
(how new tables are created).
SQL interface – the facility through which SQL commands are typed in.
Programming language interface – the facility which processes SQL
commands embedded in application program. Also known
as the host language interface.
Data dictionary – the facility that records details about the schema,
reports, data entry forms etc.
Screen & reports- the facility through which data entry screens and
reports are created.
Appln. Generation- the facility through which applications are created.
Export/Import -- the facility through which files can be imported/exported
in different DBMS formats.
DBMS Kernel -- the actual programs which interact with the O/S and
carry out data I/O.
USAGE OF DATA/INFORMATION
Suppose we have detailed information on
each and every transaction in a store, what
can we do with that?
What if a customer wants to return a shirt
purchased in the store?
What if a manager wants to know what
products were sold on a particular day?
27
IMPORTANCE OF DBMS’s
Databases are used:
 To store and record information e.g. bal, price, grades etc.
 To retrieve information e.g. check#432 cashed?
 To report information e.g. daily sales
 To answer queries e.g. how many shoes were sold?
Advanced uses:
 To analyze trends
 Identify sales prospects
28
INTRO TO DATABASES - II
IS 340
BY
CHANDRA S. AMARAVADI
29
DBMS EVOLUTION
30
EVOLUTION OF DBMS
File processing
systems
Hierarchical
systems
1 March 2002 Greece
2 July 2003 Italy
3 ……………..
Photo1
Greece
Person
Photo2
Italy
Relational
systems
PhotoId
31
EVOLUTION OF DBMS..
FILE PROCESSING SYSTEMS:
A system where data was managed by the program
in the form of flat files.



Data in the form of “flat files” (ascii files)
Each program had its own specific files
Data was duplicated across files.
32
EVOLUTION OF DBMS..
PROBLEMS CAUSED BY FILE PROCESSING



Uncontrolled redundancy
Poor data quality (a.k.a.?)
Lack of data sharing
33
HIERARCHICAL MODEL..

Structure is hierarchical (not flat)
There are one or more root segments
segments are linked hierarchically

Difficult to write queries but very efficient (fast)!


34
HIERARCHICAL MODEL
(D) DEPARTMENT
NAME
(A) ADMIN-STAFF
ASNAME
JOB_TITLE
TOTAL_ENROLL
NO_OF_PHD
(F) FACULTY
FNAME
RANK
DEGREE
What do you call this?
35
RELATIONAL MODEL
Data organized into tables
Data retrieved by using SQL, 4th GL
 joins tables by equating values of cross reference keys.
ACCT
ACCT#
NAME
DT. OPENED
8895
8896
8897
Smith
Farley
Gomez
4/16/90
4/22/91
1/10/00
BALANCE
$35,000
$300
$2,000
36
THE DATABASE
DEVELOPMENT APPROACH
37
Entity classes
File 1
prod
File 2
cust.
empl.
Organization
Data base
Data model
38
THE DATABASE CONCEPT..
 Organizations need to collect and manage
information.
 Typically this need can be summarized in
terms of entity classes and their relationships
 Known as a (conceptual) data model/ER
model
 ER model is converted to database model
 Implemented
39
Prog. 1
Prog. 2
 Integrated
conceptualization
 Convert it to design
 Go through development
process
DBMS
 Different
Database
users/programs can
share the information easily
 Can access info via SQL
 Can get reports easily
40
GETTING INFORMATION
FROM FILE PROCESSING
CROSS REFERENCE
REPORT
CUSTOMER ACCTS.
LOANS
ACCT. DATA
LOAN DATA
41
GETTING INFORMATION
FROM A DATABASE
ACCT
ACCT#
8895
8896
8897
NAME
Smith
Farley
Gomez
DT. OPENED
4/16/90
4/22/91
1/10/00
BALANCE
$35,000
$300
$2,000
How can
We combine
these two
Tables?
LOAN
LID
AMOUNT
9978
9978
9992
$6,000
$5,000
$1,000
INT. RATE
BALANCE
6.0%
7.1%
8.5%
$2,440
$5,000
$ 400
42
DATA MODELS
43
DATA MODELS -- BASIC CONCEPTS
Data models: representations of relationships among entity
classes
Entity
– Individual example of person, place or thing.
Entity Class – Collection of related entities (eclass).
Attributes – Properties of entity classes about which we
would like to collect information.
Primary key -- An attribute whose values are unique within
an entity class (pkey).
44
DISCUSSION
Classify the following as entity, entity class or attribute
 San Francisco
 Tires
 Customer
 Book
 ISBN#
 Neon
 Peoplesoft
 Cust. balance
 Microsoft
 Microsoft employees
 Date of Birth
 Invoice
45
DATA MODELS…
Cardinality: Number of entities participating in a relationship
A
A
A
1:1
1:M
M:N
B
For each value of A, one
and only one value of B
and vice versa.
B
For each value of A,
multiple values of B,
but for each B only one A.
B
For each value of A,
many values of B and vice
versa.
46
DISCUSSION
Identify the types of the following relationships
company
-- president
instructor -- students
plane -- pilots
city -- convention centers
department -- employees
team -- players
company -- city
books – authors
city -- subdivisions
product -- warranty
47
AN EXAMPLE ER CHART
Cust#
Name
ENTITY CLASS
CUSTOMER
ATTRIBUTE
Places
ORDERS
Ord#
Ord dt
RELATIONSHIP
Are for
PRODUCTS
48
DISCUSSION
Draw Data Models (ER models) corresponding to the following
Situations:
Students take courses from professors
Clients can book one or more properties with a rental agency. Each
property can be booked by more than one client (for different dates).
Trains arrive and depart from platforms in train stations,
corresponding to the cities that they stop in. A city can have many
Stations.
Airlines operate a number of flights which arrive and depart from
Gates. There could be several flights from a gate, but each flight
is assigned only one gate. Each airline operates a gate or a set of
Gates which are leased to them by the airport.
49
DISCUSSION
Design a database for the following situation: Bellsouth, a
telephone company needs a database to maintain records of
its automobiles, repairs, mechanics and special equipment needed
to repair cars. Mechanics are assigned skill codes based on the
type of repairs they can carry out. For e.g. Class III mechanics
can perform reborings, engine overhauls and transmission repairs.
The repair shops are located in 300 cities and towns throughout
the state. Mechanics can be assigned to only one repair shop at a
time. The application requires knowing what equipment is (for
e.g. diagnostic equipment, hydraulic machinery) is at what location
and also the repairs carried out on each vehicle. For each vehicle
the application requires knowing its identification number,
mileage and repair history.
50
DISTRIBUTED AND
MULTI-MEDIA DATABASES,
DATA WAREHOUSES
51
CENTRALIZED DATABASES
Centralized databases
 The database is in one physical location.
 All applications regardless of whether the clients
are located in Seoul or Detroit need to access data
from that physical location.
 What are the limitations?
52
DISTRIBUTED DATABASES
The data is physically stored in multiple geographical locations
 Replicated -- copies of the database maintained in
multiple sites
 Partitioned -- database is physically divided into
chunks
53
REPLICATED DATABASES..
54
PARTITIONED DATABASES..
Emp Address
Emp ID
Emp Name
11049
11051
11054
Cathy
Richard
Hugh
200 Meadowbrook
13 E. Willow
1400 E. Washington
…….
……..
…………………..
55
PARTITIONED DATABASES
 The database is divided into chunks.
 What is the logical way of doing this?
 Different chunks are placed in different locations.
 All the locations are connected.
 What are the advantages and limitations?
The database is in multiple physical locations
56
MULTI-MEDIA DATABASES
A database that is able to handle multiple data types:
 Conventional information
 Images
 Audio & video
Capable of storing, retrieving and updating this type of
information
Emp ID
Emp Name Emp. Picture Emp Address
Emp.jpg
57
MULTI-MEDIA DATABASES..
APPLICATIONS
 Travel bureaus
 Hospitals
 Manufacturing
 Training
58
MULTI-MEDIA DATABASES..
ISSUES
 Storage space
 Retrieval
 Modification
59
DATA WAREHOUSES
 DATA WAREHOUSE:
Historical data organized
for analysis and decision making.
 DATA MART: smaller version of data warehouse,
specialized by functional area e.g. marketing
 ON-LINE ANALYTICAL PROCESSING
(OLAP): organizes data into a “cube” for decision
analysis.
60
DATA WAREHOUSING & OLAP
OPERATIONAL,
HISTORICAL DATA
INTERNAL
DATA
SOURCES
DATA WAREHOUSE
EXTRACT,
TRANSFORM
HISTORICAL
INFORMATION
EXTERNAL
DATA
SOURCES
61
OLAP
DATA MINING
DISCUSSION QUESTIONS











Are Databases and DBMSs one and the same?
Does a record consist of files?
What is the smallest unit of data in the database?
Does data integrity refer to data quality?
What does the detail line of a report consist of?
Is an “employee” an entity class or an attribute?
Is the data dictionary useful while creating the database?
Why do we use SQL?
Is it necessary to use SQL to produce database reports?
If we have a number of diskettes in alphabetical
sequence,
is this an example of a file organization? If so, what type?
Would multi-national organizations prefer centralized
databases?
62
THAT’S ALL
FOLKS!
63
THAT’S ALL
FOLKS!
64
THAT’S ALL
FOLKS!
65