DataBase Design

Download Report

Transcript DataBase Design

DataBases
DataBase Design
Data Modeling
DataBase course notes 7
1
Data Model
• computer store model of real world
• basic concept of data modeling, in which a
representation of your database will be
produced that shows objects involved in
database design and how they interrelate
– entities
– attributes
– Relationships
– descriptive information
DataBase course notes 7
2
Data Model
• useful to be able to effectively
–communicate the database design
–visualize the database design
DataBase course notes 7
3
Modeling languages
• Integration Definition for Information
Modeling (IDEF1X)
• Information Engineering
– other major methodology
– pretty much just as good
• Chen Entity Relationship Model (ERD)
methodology
• diagram capabilities built into SQL Server
Management Studio
DataBase course notes 7
4
IDEF1X
• meet the following requirements
– support development of data models
– both easy to learn and robust
– based on a publicly available standard
– well tested and proven
– suitable for automation
DataBase course notes 7
5
IDEF1X implemented in popular
design tools
• AllFusion ERwin Data Modeler
– http://www.ca.com/us/products/product
• Toad Data Modeler
– http://www.quest.com/toad-data-modeler/
• ER/Studio
– http://www.embarcadero.com/products/erstudio
• Visible Analyst DB Engineer
– http://www.visible.com/Products/Analyst/
• Visio Enterprise Edition
– http://www.microsoft.com/office/visio
DataBase course notes 7
6
Entities
• entities (which, as discussed previously,
are loosely synonymous with tables) are
modeled as rectangular boxes
– as they are in most data modeling
methodologies
• two types of entities can be modeled
– identifier-independent / identifier-dependent
– usually referred to independent / dependent
DataBase course notes 7
7
Entities: dependent/independent
• difference has to do with how primary key
of the entity is structured
• independent entity is so named because it
has no primary key dependencies on any
other entity
• there are no foreign key columns from
other entities in the primary key
• without the existence of the parent, the
child would make no sense
DataBase course notes 7
8
Entities: dependent/independent
• for example, consider an invoice that has one or more
line items
• primary key of invoice entity might be invoiceNumber
• reasonable choice for primary key of lines items would
be invoiceNumber and then lineNumber
– primary key contains invoiceNumber, it would be dependent
upon invoice entity
• invoiceStatus entity and it was related to invoice, it would
be independent, as an invoice’s existence is not really
predicated on the existence of a status (even if a value
for invoiceStatus to invoice relationship is required)
DataBase course notes 7
9
Entities: dependent/independent
• entity independent is drawn with square
corners
• entity dependent is drawn with rounded-off
corners
DataBase course notes 7
10
Entity
DataBase course notes 7
11
Object names – singular / plural
• different “camps” each with different ideas about how to
name objects
• central issue is plural or singular
• IDEF1X standard says to use singular names
• entity name itself refers to an instance of what is being
modeled
• table name should name set of rows
•
either way more correct? Don’t know – really just pick one and stick with it
• most important thing is to be consistent
• a bad set of naming standards is better than no
standards at all
DataBase course notes 7
12
Entity names
• should never be plural
– name should refer to an instance of the object being
modeled rather than the collection
– allows you to easily use name in sentence
• uncomfortable to say that you have an “automobiles row” - you have
an “automobile row”
• name given should directly correspond to
the essence of what the entity is modeling
– keep it succinct (unlike this explanation!)
DataBase course notes 7
13
Object names – many names
• names frequently need to be made up of several words
• acceptable to include spaces, underscores, and other
characters when multiple words are necessary
• for example, entity that stores a person’s addresses
might be named Person Address, Person_Address or
• PersonAddress
– known as Pascal Case
• personAddress
– known as camel Case
• there really is no “correct” way, just guidelines that keep
everything uniform
DataBase course notes 7
14
Object names - abbreviations
• regardless of any style choices you make, no
abbreviations should generally be used in the logical
naming of entities
• abbreviations lower value of names as documentation
and tend to cause confusion
• abbreviations may be necessary in implemented model
because of some naming standard that is forced
• make sure that you have a standard in place to ensure
names use the same abbreviation every time
• don’t have to worry about different people using
Description, Descry, Desc, Descrip, and Descriptn all for
same attribute on different entities
DataBase course notes 7
15
Object names – Hungarian notation
• novice database designers elect to use
Hungarian notation and include prefixes (or
suffixes) in names - for example, tblEmployee or
tblCustomer
• generally considered very bad practice, because
names in relational databases are almost always
used in an obvious context
• Hungarian notation is a good idea when writing
functional code (like in Visual Basic or C#), since objects
don’t always have a very strict contextual meaning that
can be seen immediately upon usage
DataBase course notes 7
16
Attribute
• all attributes in entity must be uniquely
named within it
• represented by list of names inside of
entity rectangle
DataBase course notes 7
17
Attribute
DataBase course notes 7
18
Attribute names
• IDEF1X standard is that names are unique
within a model
• tends to produce names that include table
name followed by attribute name - result in
unwieldy, long names
• any decent naming standard is
acceptable, as long as it is followed
DataBase course notes 7
19
Attribute
•
•
•
•
Primary keys
Alternate keys
Foreign keys
Domains
DataBase course notes 7
20
Primary Key
• denoted by placing attributes above
horizontal line through entity rectangle
DataBase course notes 7
21
Primary Key
DataBase course notes 7
22
choice of primary key
• personal preference, in early logical
modeling phase, generally do not like to
spend time choosing final primary key
attribute(s)
– to avoid worrying too much about what key is
going to be
• tend to add simple surrogate primary key
to migrate to other entities to help me see
when there is any ownership
DataBase course notes 7
23
Alternate Key
DataBase course notes 7
24
• position number notation could be tacked
onto the name of key to denote position of
attribute in key
• in logical model, technically the order of
attributes in key should not be considered
• when key is implemented, order of
columns will become interesting for
performance reasons, as DBMS
implements uniqueness with an index
DataBase course notes 7
25
Foreign Key
DataBase course notes 7
26
• diagram doesn’t show what entity key is
migrated from
• not completely unreasonable that the same
attribute name might migrate from two separate
entities
• reasons to add key named <entityName>id as
identifier for entities - name of entity identifiable
– know where the original source of attribute is
– see attribute migrated from entity to entity without any
additional documentation
DataBase course notes 7
27
Domain
• domain - set of valid values for an attribute
• in IDEF1X, you can define named,
reusable specifications known as domains
• for example:
– String: character string
– SocialSecurityNumber: character value with
format of ###-##-####
– PositiveInteger: integer value, domain 0 to
max(integer)
DataBase course notes 7
28
Domain
• good practice to build domains
• for any attributes that get used regularly
• domains that are base templates for infrequently
used attributes
• form of inheritance in the data type definitions,
Subclasses can then be defined of domains that
inherit settings from base domain
• domains can optionally be shown to the
right of the attribute name in the entity
DataBase course notes 7
29
Domain
DataBase course notes 7
30
Relationships
• Child Entity whose instances can be related to zero or
one instance of the other entity (parent entity)
• Parent Entity whose instances can be related to a
number of instances of another entity (child entity)
• Relationship association between two entities or
between instances of the same entity
• denoted by line drawn between two entities, with
solid circle at one end to indicate where the
attribute is migrated to
DataBase course notes 7
31
Relationships
DataBase course notes 7
32
Relationships
• Identifying primary key of one table is migrated to
primary key of another - child will be dependent entity
• Nonidentifying primary key of one table is migrated to
nonprimary key attributes of another - child will be an
independent entity as long as no identifying relationships
exist
• Optional identifying when nonidentifying relationship
does not require child value
• Recursive when table is related to itself
• Subtype or categorization, which is a one-to-one
relationship used to let one entity extend another
DataBase course notes 7
33
• cardinality of relationship how many of the
parent relate to how many of the child
• role names changing name of a key in a
relationship
• verb phrases name of relationship
DataBase course notes 7
34
Identifying Relationships
DataBase course notes 7
35
Non Identifying Relationships
DataBase course notes 7
36
child table’s FK allow nulls?
• mandatory if value is required
• optional if value of migrated key can be
null
• optional case signified by open diamond at
opposite end of dashed line from black
circle
DataBase course notes 7
37
Optional
DataBase course notes 7
38
Role Names
• alternative name you can give an attribute when
it is used as foreign key
• to clarify usage of migrated key
• either parent entity is very generic and more
specific name is needed
• or same entity has multiple relationships,
attribute names must be unique, assigning
different names for child foreign key references
is often necessary
DataBase course notes 7
39
Role Names
DataBase course notes 7
40
Relationship Cardinality
•
•
•
•
•
One-to-zero or more
One-to-one or more (at least one)
One-to-zero or one (no more than one)
One-to-some fixed range
One-to-exactly N
DataBase course notes 7
41
One-to-zero or more
DataBase course notes 7
42
One-to-one or more
(at least one)
DataBase course notes 7
43
One-to-zero or one
(no more than one)
DataBase course notes 7
44
One-to-some fixed range
(in this case, between 4 and 8)
DataBase course notes 7
45
One-to-exactly N
(in this case, 5)
DataBase course notes 7
46
DataBase course notes 7
47
•
•
•
•
•
guardian and student in school database
zero-or-one to one-or-more relationship
for guardian instance to exist, student must exist
student need not have guardian
did not limit number of guardians
DataBase course notes 7
48
DataBase course notes 7
49
• club that has members with certain
positions that they should or could fill
• one-to-many
• member can take as many positions as
are possible
• allows unlimited positions for the member
DataBase course notes 7
50
DataBase course notes 7
51
• club that has members with certain
positions that they should or could fill
• one-to-one
• member can serve in no position or one
position, but no more
DataBase course notes 7
52
DataBase course notes 7
53
• club that has members with certain
positions that they should or could fill
• one-to-zero, one, or two
• member can serve in zero, one, or two
positions
DataBase course notes 7
54
Recursive Relationship
• difficult, important,
• also known as self-join, hierarchical, selfreferencing, or self-relationship
• modeled by drawing a nonidentifying
relationship not to different entity, but to same
entity
• migrated key of relationship is given a role name
– generally use naming convention of adding “parent” to
the front of attribute name
DataBase course notes 7
55
Recursive Relationships
DataBase course notes 7
56
Subtype
• also referred to as categorization
relationships
• special type of one-to-zero-or-one
relationship
• used to indicate whether one entity is a
specific type of a generic entity
• entities are drawn with rounded corners,
signifying that they are indeed dependent on the
generic entity
DataBase course notes 7
57
Subtype
DataBase course notes 7
58
• Generic entity contains all attributes
common to all of subtyped entities
• Discriminator attribute acts as switch to
determine entity where additional, more
specific information is stored
• Specific entity place where specific
information is stored, based
ondiscriminator
DataBase course notes 7
59
Subtype
DataBase course notes 7
60
• represent each video’s price, title, actors,
length, and possibly description of the
content in Video entity
• based on format, which is the
discriminator, store information that is
specific to VHS or DVD in its own separate
entity (e.g., special features and menus for
DVDs, long or slow play for VHS tapes, etc.)
DataBase course notes 7
61
2 other options to represent
• all up – only video entity
– always will be empty columns
• all down – only VHS and DVD entities
– difficult to search based on common column
DataBase course notes 7
62
Complete
Incomplete
DataBase course notes 7
63
Many-to-Many Relationship
DataBase course notes 7
64
Verb Phrases
Relationship Names
• relationships are given names, called verb phrases
• to make relationship between parent and child entity a
readable sentence and to incorporate entity names and
relationship cardinality
• name usually expressed from parent to child
• but it can be expressed in other direction, or even in both
directions
• verb phrase is located on model somewhere
close to the line that forms the relationship
DataBase course notes 7
65
Verb Phrases
Relationship Names
DataBase course notes 7
66
Alternative Modeling
Methodologies
• Descriptive Information
• Information Engineering (IE)
• other main methodology, commonly referred to
as the crow’s feet method
• Chen Entity Relationship Model (ERD)
– methodology used mostly in academic texts
• Visio
– MS tool to design a database
• Management Studio database diagrams
DataBase course notes 7
67
Descriptive Information
DataBase course notes 7
68
Entities
Entity
Attribute
Description
Persons That Can Be Contacted to Do Business
With
Contact
ContactId
Surrogate key
ContactTypeId
Primary key reference for a contactType, classifies
the
type of contact
Name
The full name of a contact
Domain of Different Contact Types
ContactType
ContactTypeId
Surrogate key
Name
The name that the contact type will be uniquely
known as
Description
The description of exactly how the contact should
be used as
DataBase course notes 7
69
Relationships
Parent Entity
Name Phrase
Child Entity
Name Definition
ContactType
Classifies
Contact
Contact type
classification
DataBase course notes 7
70
Information Engineering
• biggest difference is in how this method
denotes relationship cardinalities
• using a crow’s foot instead of a dot, and
lines and dashes instead of diamonds and
some letters
DataBase course notes 7
71
• line with labels for parent and child
– M:N where M stand for minimum cardinality
and N stand for maximum cardinality
– M or N could stand for many
DataBase course notes 7
72
Chen Entity Relation Diagram
• tool to create Chen diagrams does not
exist implemented in a mainstream
database design tool
DataBase course notes 7
73
Chen Entity Relation Diagram
DataBase course notes 7
74
MS Visio
• not a terrible choice for modeling
DataBase course notes 7
75
Management Studio Database
Diagrams
DataBase course notes 7
76
display entities in several formats
DataBase course notes 7
77
Best practices
• Pick a model language and understand it
• Pick a naming convention and be
consistent
DataBase course notes 7
78
CONCEPTUAL DATA
MODELING
DataBase course notes 7
79
Conceptual Data Modeling
•
•
•
•
understanding requirements
entities and relationships
attributes and domains
business rules that can be enforced in
database
• processes that require use of database
DataBase course notes 7
80
Understanding Requirements
•
•
•
•
documenting the process
looking for requirements
identifying objects
following up/reviewing with the client
DataBase course notes 7
81
Documenting Requirements
• try to maintain set of documents that will share system
design and specification information
• verbal change requests, sign-offs on all specifications,
such as functional, technical, testing, …
• keep members of your team up to date fully informed
• set project’s scope early on, keep it in mind at all times
– prevent project from getting too big or diverse to be achievable
within a reasonable period of time and be within the budget
DataBase course notes 7
82
Requirements Gathering
• client interviews
• prototypes and existing systems
• various other types of documentation
DataBase course notes 7
83
Questions to Be Answered
•
•
•
•
•
•
•
What Data Is Needed?
How Will the Data Be Used?
What Rules Govern the Use of the Data?
Who Will Use the Data?
What Do You Want to See on Reports?
Where Is the Data Now?
Will the Data Need to Be Integrated with Other
Systems?
• How Much Is This Data Worth?
DataBase course notes 7
84
Identifying Objects & Processes
•
•
•
•
•
identifying entities
identifying relationships between entities
identifying attributes and domains
identifying business rules
identifying fundamental processes
DataBase course notes 7
85
Identifying Entities
• People
– . . . system to manage its patients . . .
• Places
– . . . manages a couple of dental offices . . .
• Objects
– . . . with its supplies, such as sample toothpastes,
toothbrushes, and floss, as well as dental supplies . . .
• Ideas
– . . . and then invoice the patient’s insurance, if he or she has
insurance (otherwise the patient pays) . . .
DataBase course notes 7
86
Identifying Entities
• Documents
– . . . and then invoice the patient’s insurance, if he or she has
insurance (otherwise the patient pays) . . .
• Groups
– . . . patient should be able to be associated with other patients in
a family for insurance and appointment purposes . . .
• Events
– . . . for each appointment, the client needs to have everything
documented that went on . . .
• Records and Journals
– . . . client wants to be able to keep up records of all patients’
appointments without having to maintain lots of files . . .
DataBase course notes 7
87
Relationships between entities
• find how each of the entities will work with one
another to solve the client’s needs
• start first with one-to-many type of relationships
and then cover many-to-many
• consider elementary relationships that aren’t
directly mentioned in your requirements
• be careful not to make too many inferences
DataBase course notes 7
88
model after adding entities and
relationships
DataBase course notes 7
89
Identifying attributes & domains
• find attributes by noting adjectives that are
used to describe an entity you have
previously found
DataBase course notes 7
90
Identifying attributes & domains
• identifiers
– used to identify a single instance of an entity
• descriptive information
– used to describe, such as color, amounts, …
• locators
– locate what entity is modeling, such as a mailing
address, position on computer screen
• values
– things that quantify, such as monetary amounts,
counts, dates, …
DataBase course notes 7
91
Identifiers
• people
– Social Security numbers (US), full names (not always
perfect identifier), other IDs (such as customer
numbers, employee numbers, and so on)
• transactional documents (invoices, bills, …)
– usually have some sort of externally created number
assigned for tracking
• books
– ISBN numbers (titles definitely aren’t unique, not even
always by author)
DataBase course notes 7
92
Identifiers
• products
– product numbers for a particular manufacturer
(product names aren’t unique)
• companies that clients deal with
– commonly assigned a customer/client number for
tracking
• buildings
– complete address including ZIP/postal code
• mail
– addressee’s name and address and date it was sent
DataBase course notes 7
93
Final model for dental office
DataBase course notes 7
94
•
•
•
•
identifying business rules and processes
Identifying obvious additional data needs
review with the client
repeat until the customer agrees with your
model
• repeat until the customer agrees with your
model
DataBase course notes 7
95
NORMALIZATION PROCESS
DataBase course notes 7
96
IMPLEMENTING BASE TABLE
STRUCTURES
DataBase course notes 7
97
Implementing table structures
• Review the logical design
• Transform the logical model into an
implementation model
• Implement the design
DataBase course notes 7
98
logical model of movie-rental
store database
DataBase course notes 7
99
example tables
DataBase course notes 7
100
Reviewing logical design
• once start implementing DBMS objects,
it’s going to be much harder to change
things
• programmer thinks so many tables are too
“complex” to work with
• data integrity is almost always most
important thing for any corporate
application
DataBase course notes 7
101
Transforming the design
• choosing names
• dealing with subtypes
– can implement subtypes in multiple tables or
single tables
• determining tree implementation
– can implement in the same table
• choosing key implementation
DataBase course notes 7
102
Transforming the design
• determining domain implementation
– decide data types, nullability, choosing
between using a domain table or column with
constraint
• setting up schemas
– groups of tables for usage, as well as security
• reviewing “final” implementation model
DataBase course notes 7
103
Choosing names
• regular identifiers
– name must not be a DBMS reserved word,
name cannot contain spaces
• delimited identifiers
– should have either square brackets or double
quotes around the name
• though double quotes are allowed only when the
SET QUOTED_IDENTIFIER option is set
DataBase course notes 7
104
column naming
• table name should rarely be included in
column name.
– surrogate key such as PersonId
– columns that are naturally named with entity
name in them, such as PersonNumber
• usually, name should end in a “class” word
that distinguishes function of the column
– name, code, id, time, date
DataBase course notes 7
105
Dealing with Subtypes
• rolling up subtypes
• leaving as separate tables
• rolling down generic types
DataBase course notes 7
106
Subtypes
DataBase course notes 7
107
Subtypes
• to create new media rental, you have to
create rows in at least two tables
– MovieRentalPackage and VideoTape or Dvd,
and another in DvdSoundFormat for a DVD)
• to see list of all items available for rental,
write moderately complex query
– joins MovieRentalPackage with VideoTape
and union this with another join between
MovieRentalPackage and Dvd
DataBase course notes 7
108
Rolling up subtypes
DataBase course notes 7
109
Rolling up subtypes
• when MediaFormat is 'VideoTape',
DvdRegion value doesn’t apply and must
be set to NULL
• formats won’t be applicable to 'VideoTape'
but apply to 'Dvd‘
• structure becomes less and less selfdocumenting, requiring more code to keep
everything straight, requiring constraints
DataBase course notes 7
110
Determining Tree
Implementation
DataBase course notes 7
111
Alternative Tree Implementation
DataBase course notes 7
112
Choosing Key Implementation
• primary key on existing columns
– in many cases, table will have an obvious,
easy-to-use primary key
• primary key on a new, surrogate value
– every table has a single-column primary key
– primary key index will be small
– joins between tables will be easier to code
• don’t forget the alternate keys
DataBase course notes 7
113
Determining Domain
Implementation
•
•
•
•
•
nullability
NOT NULL, allow NULL
data type
CHECK constraint
DEFAULT
DataBase course notes 7
114
Implement as Column or Table?
• for example, in the demonstration table
Movie, a column Genre has a domain of
Genre
• CHECK constraint IN ('Comedy', 'Drama',
'Family', …)
• rarely use automatically generated value
for primary key of a domain table
DataBase course notes 7
115
Genre as domain table
DataBase course notes 7
116
Choosing the data type
• precise numeric data
– bit, tinyint, smallint, int, bigint, decimal
• approximate numeric data
– float, real
• date and time
– date, time, datetime, datetimeoffset
• binary data
– binary, varbinary, varbinary(max)
• character (or string) data
– char, varchar, varchar(max), nchar, nvarchar, nvarchar(max)
DataBase course notes 7
117
Other data types
•
•
•
•
•
sql_variant
stores any datatype
rowversion (timestamp is a synonym)
changes on every modification
uniqueidentifier
– stores a GUID value
• XML
• geometry and geography
• heirarchyId
DataBase course notes 7
118
Deprecated or bad choice types
• image
– replace with varbinary(max)
• text or ntext
– replace with varchar(max) and nvarchar(max)
• money
– store monetary data in decimal data types
DataBase course notes 7
119
Setting Up Schemas
DataBase course notes 7
120
Reviewing “Final”
Implementation Model
DataBase course notes 7
121
IMPLEMENTING THE DESIGN
DataBase course notes 7
122
• CREATE TABLE
[<database>.][<schema>.]<tablename>
• (
• <column specification>
• )
DataBase course notes 7
123
• <columnName> <datatype> [<NULL
specification>] [IDENTITY
[(seed,increment)]
• --or
• <columnName> AS <computed
definition>)
DataBase course notes 7
124
• CREATE TABLE Inventory.MovieFormat (
• MovieFormatId int NOT NULL
• CONSTRAINT PKInventory_MovieFormat
PRIMARY KEY CLUSTERED,
• Name varchar(20) NOT NULL
• )
DataBase course notes 7
125
• [CONSTRAINT constraintname] UNIQUE
[CLUSTERED | NONCLUSTERED]
• ALTER TABLE Inventory.Genre
• ADD CONSTRAINT
AKInventory_Genre_Name UNIQUE
NONCLUSTERED (Name)
DataBase course notes 7
126
• ALTER TABLE <tableName>
– ADD [ CONSTRAINT <DefaultName> ]
– DEFAULT <constantExpression>
– FOR <columnName>
DataBase course notes 7
127
• [CONSTRAINT <constraintName>]
• FOREIGN KEY REFERENCES
<referenceTable> (<referenceColumns>)
• [ON DELETE <NO ACTION | CASCADE |
SET NULL | SET DEFAULT> ]
• [ON UPDATE <NO ACTION | CASCADE |
SET NULL | SET DEFAULT> ]
DataBase course notes 7
128
• CASCADE
– ON DELETE, delete child rows; ON UPDATE, change
child rows to match updated values
• SET NULL
– ON DELETE or UPDATE of key values, set child row
values to NULL, even if the value is updated to a valid
value
• SET DEFAULT
– ON DELETE or UPDATE, set child values to a default
value, much like SET NULL, but useful for columns
that cannot be set to NULL
DataBase course notes 7
129
Extra-Database Relationships
• collations and sorting
• collation sequence shows how data is
sorted when needed and how data is
compared
DataBase course notes 7
130
User-defined data types
• CREATE TYPE <typeName>
• FROM <intrinsic type> --any type that can
be used as a column of a table, with
precision and scale or length,
• [NULL | NOT NULL]
• CREATE RULE
– could be bound to columns
• CREATE DEFAULT
– could be bound to columns
DataBase course notes 7
131
Documenting Database
DataBase course notes 7
132