Introduction to Database

Download Report

Transcript Introduction to Database

Introduction to Database
1
Data versus Information
• When people distinguish between data and
information,
– Data is simply a set of individual numbers or facts.
• E.g. John Smith got a 87 on Test 2 in CSC 240.
– Information arises from integration – putting the
individual facts together to see the larger picture or by
asking a question not answerable by just one piece of
data.
• John Smith scored more than 10 points over the class average
on all of the tests in CSC 240.
• Jane Jones got the highest score on Test 2 is CSC 240.
2
Database
• The purpose of a database is to bring together a set
of data so that it is centralized, organized, easy to
extract information from, and so forth.
• The way the facts are organized and how they
relate to each other is known as the database
design.
• The software application that facilitates the storing
of the data and the accessing of the information is
known as a database management system.
3
What do we want?
• Desired Features of our database
– Storage:
• We want to store data efficiently, have it centralized
(or at least seemingly centralized).
– Retrieval:
• We want to have the data at our fingertips when we
want it.
– Querying:
• We want to ask various questions about the data
(and get answers in a timely manner).
• (These desires are to some extent in
conflict.)
4
Database Modeling
• In database design, one tries to organize the
data in a way that reflects the situation from
which the data comes.
• One approach is known as entityrelationship modeling or ER modeling.
• The design is often represented pictorially
in an ER Diagram.
5
Design  Implementation
• After a database is designed, one uses a
database management system to implement
(actually make) the database.
• In database implementation, one encounters
terms like table, record, field, keys, and so
forth.
6
Entities
• One identifies units of information that clearly
belong together. These are known as the entities.
• An entity is somewhat similar to an object in
programming, it collects data that belongs together
in some immediate way.
• Entities also separate the data into distinct units.
• Database entities often reflect real objects/entities
(persons, buildings, courses, etc.)
7
Fields
• The lower-level pieces of data gathered together to
form an entity are known as fields or attributes or
properties.
– The Person entity might consist of fields like
FirstName, LastName, JobType, SocSecNum, etc.
– Fields are analogous to properties of an object.
• Fields have a type (Text, Number, Yes/No, Memo,
Date/Time, etc.) which indicate how the
information is to be stored and interpreted.
8
Relationship
• The various entities may be distinct, but
they are not completely disconnected.
– E.g. a Customer places an order
• An association between two entities is
known as a relationship.
9
ER Diagram
• One can visualize the entities and their
relationship using an Entity-Relationship
(ER) diagram.
– The entities are represented by rectangles.
– The relationships are represented by arrows
between the rectangles.
• The arrow may include a verb to capture the nature
of the relationship (as well as other notations).
10
ER Diagram Example
Customer
CustomerID
CustomerFirstName
…
Places
Order
OrderID
ShippingCost
…
Is part of
Item
ItemID
ItemDescription
…
11
DBMS
• Before databases a programmer had to worry
about
– where files are to be found,
– the order in which the fields occur,
– the length of the fields and/or the delimiter used,
• Such features of the file-based approach are called
program-data dependence.
• These are now taken care of by the database
management system (DBMS).
12
Database vs. DBMS
• The generalized routines for reading,
parsing, searching, sorting etc. are in the
DBMS.
• But information specific to a particular case
(number of fields, their type, size and so on)
is still required. This data is placed together
with the “actual” data in the database.
13
Meta-data
• This data about the data is known as metadata.
– Meta: a prefix meaning: after, along with or
beyond
• The meta-data describes the actual data, and
so databases are sometimes called selfdescribing.
• Related terms include: data dictionary,
system catalog and schema.
14
Meta-data: Open a database file (Access in this
example) in Notepad
Some
actual
data
One can see there’s more to this file than just customer data.
15
Database/DBMS Distinction
Database
Raw-data and meta-data
DBMS
User
Application
Users and applications interact with a database only through
the DBMS.
16
Pros of Database Approach
• Control of data
redundancy
• Data consistency
• More info from same data
• Sharing of data
• Improved data integrity
• Improved security
• Enforcement of standards
• Economy of scale
• Balancing of conflicting
requirements
• Improved accessibility and
responsiveness
• Improved maintenance
through data independence
• Increased concurrency
• Improved backup and
recovery services
17
Client-Server
• The client-server model is a way for transactions
to take place.
– The transaction is viewed as a service.
– The client requests the service.
– The server provides the service.
• For example, to query a networked database
– A client would request the network server(s) to connect
it to the database server
– The database server queries the database
– The result is passed from database server to network
server to client.
• The client-server terminology can be applied to
both software and hardware.
18
SQL
• SQL (Structured Query Language),
pronounced S-Q-L or See-Quel, has become
the standard language for relational
databases.
• SQL is part third generation and part fourth
generation.
19
What’s it made of?
• SQL has 3 components:
– Data Definition Language (DDL)
• The part that allows you to establish the structure of
the database
– Data Manipulation Language (DML)
• The part that allows you to enter data, update data
and ask questions of the data (queries)
– Data Control Language (DCL)
• The part that allows you to add security features
(e.g. user authentication), concurrency (multi-user)
features, recovery features, etc.
20
Express Relationships as Verbs
• Relationships are generally expressed as verbs.
For example,
– Athlete comes from Country
– Athlete competes in Event
• The relationship can be represented by a line
between the two rectangles representing the
participating entities. The verb is written on the
line. In the Chen model, the verb is placed in a
diamond.
21
Basic Structure: Relationships
• Relationships are said to have a multiplicity.
• Relationships are categorized by how many things are
related to how many things.
– 1:M (one-to-many)
– M:N (many-to-many)
– 1:1 (one-to-one)
22
Relationship Examples
• One-to-many
– A country will be represented by many athletes, but
each athlete represents only one country.
• Many-to-many
– An athlete may compete in many events, and an event
has many athletes competing in it.
• One-to-one
– Each country has one athlete serve as flag bearer in the
opening ceremony.
23
Degree of a Relationship
• Relationships are said to have a degree (the
number of entity types involved).
– Binary: involves two entities
– Ternary: involves three entities
– Quaternary: involves four entities
• Even if not using the Chen model, ternary
and higher degree relationships are
represented using a diamond.
24
Ternary Relationship Example
StockHolder
No arrows in
relationships
with degree
higher than 2.
Buy/Sell
Stock
StockBroker
A StockHolder buys or
sells a stock through a
StockBroker.
25
Quaternary Relationship Example
Producer
Actor
Make a movie
Director
Writer
26
Recursive Relationship
• If an entity (type) has a relationship with
itself, that relationship is called recursive.
• The entity occurrences in the relationship
may be distinct.
• Since the subject and object are of the same
entity type, the “roles” the occurrences are
playing in the relationship may be added to
the diagram.
27
Recursive Relationship
• A typical example here is if one employee
serves as the supervisor of another
employee.
– While the relationship involves different entity
occurrences (i.e. two different employees), it
involves only one entity type. Thus as far as
entity type goes, the Employee entity has a
relationship with itself.
28
Role names are used to clarify situations
with multiple relationships
Teacher
Teaches 
Faculty
Advisor
Student
Advises 
Student
Advisee
29
Attributes
• Attributes are the properties of an entity.
– E.g. the attributes of a Customer are FirstName,
LastName, etc.
• Relationships can also have properties.
– E.g. a stock is bought or sold on a particular
date (at a particular price).
– (One may consider introducing a new entity
called a transaction.)
30
Attribute Domain
• An attribute’s domain is the set of values that a
property is allowed to take on.
• For example,
–
–
–
–
The quantity of items ordered is  0
The price paid is  0
Gender would be ‘M’ or ‘F’ (or perhaps NULL)
Phone numbers consist of numbers only. One can also
specify the number of digits or a range thereof.
31
Self-documenting attribute names
• When it comes time to implement the database
and one is turning attributes into the
corresponding fields, resist the temptation to use
abbreviated field names.
• If you use descriptive field names, your
implementation will be self-documenting – in that
many people will know what you mean simply by
the name you have used.
– Some designers suggest that the first part of a field
name refer to the table/entity it belongs to. This can be
especially useful with common fields like IDs and
names.
32
Attributes: Simple or Composite
• A property that takes on a value that cannot
be broken down into pieces is called simple,
(aka “atomic”)
– E.g. quantity, price, gender
• A property that can be broken into
constituent properties is called composite.
– E.g. address  street, city, state, zipcode
– name  firstName, lastName
33
Attributes: Single-valued or Multi-valued
• Single-valued: a property that takes on only one
value at a time for a given entity occurrence.
– E.g. dateOfBirth, you only have one
• Multi-valued: a property that can take on more
than one value at the same time for a given entity
occurrence
– E.g. phoneNumber (home and cell number)
– E.g. beneficiary
34
Attributes: Derived
• If a property can be determined from other
properties, it is said to be derived.
– E.g. age or AgeCategory (20-29, 30-39, etc.)
can be derived from dateOfBirth
– E.g. taxBracket can be derived from
grossIncome and deductions
– E.g. city might be derivable from zipcode
35
Keys
• A candidate key is a minimal set of
properties that uniquely determine each
entity occurrence (record, row, tuple).
• A candidate key may be composite, i.e.
consisting of more than one property.
• Minimal above means that if one property is
removed from the set, the set no longer
uniquely determines an occurrence.
36
Keys (Cont.)
• There can be more than one candidate key.
– In the school’s database (banner), a person can be
identified by his or her
• socialSecNumber
• idNumber
• pidm
• The primary key is the candidate key that is
selected to identify the entity occurrence
internally (within the database).
• A candidate not chosen to be the primary is
sometimes called an alternate key.
37
Relational Model: Ingredients
• The main components of the Relational Model are
tables (a two-dimensional array).
• Tables are a realization of the mathematical
concept of a relation.
• Tables are reminiscent of the files used in a filebased approach.
• Table  Relation  File
• The table is logical and the data does not
necessarily take this form physically.
• A table has a name.
38
Table  Relation  File
39
Relational Model: Ingredients (Cont.)
• A table collects together associated data.
• A table is thought of in terms of rows and columns.
• The data in a single column is all of the same type, i.e.
all the same property.
– E.g. all of the people’s last names.
• The column (a.k.a. field) has a name and a type (e.g.
text, number, etc.).
• A table is distinct from a similar looking mathematical
object, the matrix, in that the order of the columns
does not matter.
• Column  Field  Attribute  Property
40
Column  Field  Attribute  Property
41
Relational Model: Ingredients (Cont.)
• The row (a.k.a. a record) collects together
the various properties that belong to a
particular object.
– E.g. a person’s first name, last name, date of
birth, etc.
• Again a table is distinct from a matrix, in
that the order of the rows does not matter.
• Row  Record  Tuple
42
Row  Record  Tuple
43
More Relational Model Vocabulary
• In addition to having a type, a field has a domain,
the set of values that the particular property is
allowed to have.
– E.g. a number must fall between 0 and 100.
– E.g. some text (string) must have two letters followed
by four numbers.
– E.g. a person’s gender must be M or F.
• Ensuring that a value falls within the domain is
called applying the domain constraint.
44
Input masks and Validation Rules are ways
to impose domain constraints in Access
45
Validation Rule example
46
Since data was entered before the
validation rule was introduced. The
old data must be tested for
consistency.
47
More Relational Model Vocabulary (Cont.)
• The number of fields (tuples) in a table is known
as its degree.
–
–
–
–
Unary relations (1-tuples)
Binary relations (2-tuples)
Ternary relations (3-tuples)
N-ary relations (N-tuples)
• The number of records in a table is called its
cardinality.
• The degree is a property of the schema, while the
cardinality is a property of the instance.
48
cardinality
Degree and Cardinality
degree
49
Keys
• A fields or set of fields that can be used to uniquely
identify all of the rows in a table is known as a key.
• A key should not have any extraneous fields.
– E.g. if SocSecNum uniquely identifies a person, then you
don’t need SocSecNum and LastName.
• A table may have more than one field or set of fields
that serve this purpose, they are called collectively
the candidate keys.
• One key is chosen from the candidate keys to be the
primary key.
50
Keys (Cont.)
• When choosing a primary key, make sure that it must
be unique, as opposed to simply happening to be
unique for the instance you have or have in mind.
• Because of redundancy issues, it should not contain
too many fields or fields that might change.
• Be mindful of privacy issues, SocSecNum can be a
bad choice.
• For the reasons above, one often introduces an ID
field to serve as a primary field.
51
Purpose of Keys
• Keys are used to
– Uniquely identify a record as in a query
– Sort the data
– Establish relationships
• When one table’s key is found in another
table for the purpose of establishing a
relationship, it is known as a foreign key.
52
References
•
•
•
•
Database Systems Rob and Coronel
Database Systems, Connolly and Begg
SQL for Dummies, Taylor
http://www.metacard.com/wp1a.html
• http://www.oracle.com/glossary/index.html?axx.html
• http://www.itworld.com/nl/db_mgr/05072001/
• Concepts of Database Management, Pratt and
Adamski
• http://www.whatis.com
• http://www.webopedia.com
53