Database - La Salle University

Download Report

Transcript Database - La Salle University

Database
More database concepts and vocabulary
1
CSC 240 (Blum)
2
Database Categorizations I
 How
many users can use the database at
a given time?


If one: It is a single-user database.
If more than one: It is a multi-user database.
 If
used by only several users in one
department: It is a workgroup database.
 If used by many departments across the
entire company: It is an enterprise database.
CSC 240 (Blum)
3
Database Categorizations II
 How
many computers are used for the
database? (Note that this is distinct from
how many computers do the users
employ to access the database.)


If one: It is a centralized database.
If more than one: It is a distributed
database.
 There
are various scenarios for handling
distributed databases, such as do the
computers have distinct data or copies of the
same data, etc.
CSC 240 (Blum)
4
Database Categorizations III
 How


up-to-date is the data in the database?
If the data is the database is used for the normal
daily operations, for example, what items are in
stock, what has been purchased in the present
quarter, etc.: It is a production database or
transaction database.
If the data is more historical, for example, sales
over the last ten years, which one will look back on
to see trends, etc.: It is a data warehouse
database.
CSC 240 (Blum)
5
Different Points of View
 One
might think that integrating various
department’s data might
 Confuse
the naïve user
 Provide the sophisticated user with too much
information (e.g. all the salary info, social
security numbers, etc.)
 Views,
however, can be used to provide a
user with only a subset of the database
 Simplifying
life for the naïve user
 Providing a level of security by limiting what the
sophisticated user can access
CSC 240 (Blum)
6
Database Design and
Modeling




The database approach was a big step forward
from the file-based approach.
There have been various steps within the
database approach. One of the most important is
the development of the Relational Model.
Models are simplified abstractions of real-world
events or conditions.
Good models yield good database designs that
are the basis for good applications.
CSC 240 (Blum)
7
Relational Model




First developed by E. F. Codd.
The main ingredient of the Relational Database
Management System is the “table” structure.
A table is a matrix, consisting of a series of rows
and columns.
One goal of the Relational model is to integrate a
lot of data, maintain the relationships within that
data, and yet minimize the amount of
redundancy.
CSC 240 (Blum)
8
E. F. Codd
Our Hero
CSC 240 (Blum)
9
Relational Model
Advantages
 Structural Independence
 Can change the database’s structure (e.g.
add new fields) without impairing the
DBMS’s ability to access and manipulate the
data.
 Improved conceptual simplicity
 Facilitates implementation, management and
use
 Good design is crucial
 Ad hoc query capability
 Can answer questions that come up after
CSC 240 (Blum)
the database was designed.
10
Basic Structure: Entities
 Entity



A person, place or thing about which data are to be
collected and stored.
Represented by a rectangle in the Entity-Relationship
(ER) diagram.
Each entity is described by a set of attributes describing
a particular characteristic of the entity.
 Relationship

An association among data. Most relationships
describe the associations between two entities.
CSC 240 (Blum)
11
E. F. Codd’s Rules
 Rule
1: The Information Rule. All data should be
presented to the user in table form.
 Rule
2: Guaranteed Access Rule. All data should
be accessible without ambiguity. This can be
accomplished through a combination of the
table name, primary key, and column name.
CSC 240 (Blum)
12
E. F. Codd’s Rules

Rule 3: Systematic Treatment of Null Values. A field
should be allowed to remain empty. This involves
the support of a null value, which is distinct from
an empty string or a number with a value of zero.
Of course, this can't apply to primary keys. In
addition, most database implementations support
the concept of a non- null field constraint that
prevents null values in a specific table column.
CSC 240 (Blum)
13
E. F. Codd’s Rules
 Rule
4: Dynamic On-Line Catalog Based
on the Relational Model. A relational
database must provide access to its
structure through the same tools that are
used to access the data. This is usually
accomplished by storing the structure
definition within special system tables.
CSC 240 (Blum)
14
E. F. Codd’s Rules
 Rule
5: Comprehensive Data
Sublanguage Rule. The database must
support at least one clearly defined
language that includes functionality for
data definition, data manipulation, data
integrity, and database transaction
control. All commercial relational
databases use forms of the standard SQL
(Structured Query Language) as their
supported comprehensive language.
CSC 240 (Blum)
15
E. F. Codd’s Rules
 Rule
6: View Updating Rule. Data can be
presented to the user in different logical
combinations, called views. Each view
should support the same full range of
data manipulation that direct-access to
a table has available. In practice,
providing update and delete access to
logical views is difficult and is not fully
supported by any current database.
CSC 240 (Blum)
16
E. F. Codd’s Rules
 Rule
7: High-level Insert, Update, and Delete.
Data can be retrieved from a relational
database in sets constructed of data from
multiple rows and/or multiple tables. This rule
states that insert, update, and delete
operations should be supported for any
retrievable set rather than just for a single row
in a single table.
CSC 240 (Blum)
17
E. F. Codd’s Rules
 Rule
8: Physical Data Independence. The
user is isolated from the physical method
of storing and retrieving information from
the database. Changes can be made to
the underlying architecture (hardware,
disk storage methods) without affecting
how the user accesses it.
CSC 240 (Blum)
18
E. F. Codd’s Rules
 Rule
9: Logical Data Independence. How
a user views data should not change
when the logical structure (tables
structure) of the database changes. This
rule is particularly difficult to satisfy. Most
databases rely on strong ties between the
user view of the data and the actual
structure of the underlying tables.
CSC 240 (Blum)
19
E. F. Codd’s Rules

Rule 10: Integrity Independence. The database
language (like SQL) should support constraints on
user input that maintain database integrity. This rule is
not fully implemented by most major vendors. At a
minimum, all databases do preserve two constraints
through SQL. No component of a primary key can
have a null value. (see rule 3). If a foreign key is
defined in one table, any value in it must exist as a
primary key in another table.
CSC 240 (Blum)
20
E. F. Codd’s Rules
 Rule
11: Distribution Independence. A user
should be totally unaware of whether or
not the database is distributed (whether
parts of the database exist in multiple
locations).
 Rule 12: Nonsubversion Rule. There should
be no way to modify the database
structure other than through the multiple
row database language (like SQL). Most
databases today support administrative
tools that allow some direct manipulation
of the datastructure.
CSC 240 (Blum)
21
Field Versus Data
A
possible confusion to the beginning
database student is that between a
field/attribute and data that might be
entered into that field.
 For example, a confused designer of a sports
database might list figure skating or alpine
skiing as event fields. Rather event fields
should be something like EventCategory and
EventName. Then skating is an example of
an EventCategory and Women’s Figure
Skating is an example of an EventName.
CSC 240 (Blum)
22
Entity Type/Occurrence
 Type


versus occurrence
The entity type/occurrence distinction is
similar to the class/object distinct in objectoriented programming.
An entity type is a template for an entity
occurrence.
 Dog
is an entity type (class), whereas Lassie is
an entity occurrence (object).
CSC 240 (Blum)
23
Entity Type/Occurrence
 The
entity type (like a class) is a more
abstract gathering of associated data.

E.g. Customer is an entity type that
gathered together properties such as
FirstName, LastName, etc.
 The
entity occurrence (like an object) has
specific values

E.g. John Smith is an entity occurrence with
FirstName of John, lastName of Smith, etc.
CSC 240 (Blum)
24
When thinking of an entity type, think of a
table under Structure tab.
CSC 240 (Blum)
25
When thinking of an entity occurrence, think about
the Browse tab
CSC 240 (Blum)
26
Entities: Strong and Weak


Entities are sometimes categorized as strong and
weak.
A strong entity has an independent existence,
whereas the weak entity depends on some other
entity. The strong to weak relationship among
entities can be called
 Parent – child
 Owner – dependent
 Dominant – subordinate
CSC 240 (Blum)
27
Strong-weak entity sport example
Event
Strong
Parent
Owner
Dominant
Trial
Weak
Child
Dependent
Subordinate
CSC 240 (Blum)
28
Relationship Type/Occurrence

A relationship type is some association between
entity types. Like the entity type, the relationship
type is an abstract template.


E.g. “Places” is a relationship type between the
Customer and Order entity types (customer places
order).
A relationship occurrence is a specific association
between specific entity occurrences.

E.g. John Smith placed Order ORD0004 is a relationship
occurrence.
CSC 240 (Blum)
29
When thinking of a relationship
type, think of “Relationship view”
CSC 240 (Blum)
30
When thinking of a relationship
occurrence, think of choosing a
foreign key from a drop-down list
CSC 240 (Blum)
31
Express Relationships as Verbs


Relationships are generally expressed as
verbs. For example,
 Athlete comes from Country
 Athlete competes in Event
The relationship can be represented by a
line between the two rectangles
representing the participating entities. The
verb is written on the line. In the Chen
model, the verb is placed in a diamond.
CSC 240 (Blum)
32
Chen modeled relationship
Event
Athlete
AthleteID
AthleteFName
…
Competes in
EventID
EventCategory
…
CSC 240 (Blum)
33
Basic Structure: Relationships
 Relationships
are said to have a
multiplicity.
 Relationships are categorized by how
many things are related to how many
things.
 1:M (one-to-many)
 M:N (many-to-many)
 1:1 (one-to-one)
CSC 240 (Blum)
34
Relationship Examples
 One-to-many

A country will be represented by many athletes,
but each athlete represents only one country.
 Many-to-many

An athlete may compete in many events, and
an event has many athletes competing in it.
 One-to-one

Each country has one athlete serve as flag
bearer in the opening ceremony.
CSC 240 (Blum)
35
Degree of a Relationship
 Relationships
are said to have a degree
(the number of entity types involved).



Binary: involves two entities
Ternary: involves three entities
Quaternary: involves four entities
 Even
if not using the Chen model, ternary
and higher degree relationships are
represented using a diamond.
CSC 240 (Blum)
36
Ternary Relationship Example
StockHolder
No arrows in
relationships
with degree
higher than 2.
Buy/Sell
Stock
StockBroker
A StockHolder buys or
sells a stock through a
StockBroker.
CSC 240 (Blum)
37
Quaternary Relationship Example
Seller
Buyer
purchases
House
Agent
CSC 240 (Blum)
38
Recursive Relationship
 If
an entity (type) has a relationship with
itself, that relationship is called recursive.
 The entity occurrences in the relationship
may be distinct.
 Since the subject and object are of the
same entity type, the “roles” the
occurrences are playing in the
relationship may be added to the
diagram.
CSC 240 (Blum)
39
Recursive Relationship
A
typical example here is if one employee
serves as the supervisor of another
employee.

While the relationship involves different
entity occurrences (i.e. two different
employees), it involves only one entity type.
Thus as far as entity type goes, the
Employee entity has a relationship with
itself.
CSC 240 (Blum)
40
Role names are used to clarify
situations with multiple
relationships
Teacher
Teaches 
Faculty
Advisor
Student
Advises 
Student
Advisee
CSC 240 (Blum)
41
Attributes
 Attributes

are the properties of an entity.
E.g. the attributes of a Customer are
FirstName, LastName, etc.
 Relationships


can also have properties.
E.g. a stock is bought or sold on a particular
date (at a particular price).
(One may consider introducing a new
entity called a transaction.)
CSC 240 (Blum)
42
Attribute Domain
 An
attribute’s domain is the set of values
that a property is allowed to take on.
 For example,




The quantity of items ordered is  0
The price paid is  0
Gender would be ‘M’ or ‘F’ (or perhaps NULL)
Phone numbers consist of numbers only. One
can also specify the number of digits or a range
thereof.
CSC 240 (Blum)
43
Self-documenting attribute names


When it comes time to implement the database and
one is turning attributes into the corresponding fields,
resist the temptation to use abbreviated field names.
If you use descriptive field names, your
implementation will be self-documenting – in that
many people will know what you mean simply by the
name you have used.
 Some designers suggest that the first part of a field
name refer to the table/entity it belongs to. This
can be especially useful with common fields like IDs
and names.
CSC 240 (Blum)
44
Attributes: Simple or
Composite
A
property that takes on a value that cannot be
broken down into pieces is called simple, (aka
“atomic”)


E.g. quantity, price, gender
A property that can be broken into constituent
properties is called composite.


E.g. address  street, city, state, zipcode
name  firstName, lastName
CSC 240 (Blum)
45
Attributes: Single-valued or Multivalued
 Single-valued:
a property that takes
on only one value at a time for a given
entity occurrence.

E.g. dateOfBirth, you only have one
 Multi-valued:
a property that can take
on more than one value at the same
time for a given entity occurrence


E.g. phoneNumber (home and cell
number)
E.g. beneficiary
CSC 240 (Blum)
46
Attributes: Derived
 If
a property can be determined from
other properties, it is said to be derived.



E.g. age or AgeCategory (20-29, 30-39,
etc.) can be derived from dateOfBirth
E.g. taxBracket can be derived from
grossIncome and deductions
E.g. city might be derivable from zipcode
CSC 240 (Blum)
47
Keys
A
candidate key is a minimal set of
properties that uniquely determine each
entity occurrence (record, row, tuple).
 A candidate key may be composite, i.e.
consisting of more than one property.
 Minimal above means that if one property
is removed from the set, the set no longer
uniquely determines an occurrence.
CSC 240 (Blum)
48
Keys (Cont.)
 There

can be more than one candidate key.
In the school’s database (banner), a person can
be identified by his or her
socialSecNumber
 idNumber
 pidm

 The
primary key is the candidate key that is
selected to identify the entity occurrence
internally (within the database).
 A candidate not chosen to be the primary is
sometimes called an alternate key.
CSC 240 (Blum)
49
References
 Database
Systems Rob and Coronel
 Database Systems, Connolly and Begg
 SQL for Dummies, Taylor
 http://www.metacard.com/wp1a.html

http://www.oracle.com/glossary/index.html?axx.html
 http://www.itworld.com/nl/db_mgr/05072001/
 Concepts
Adamski
of Database Management, Pratt and
CSC 240 (Blum)