Transcript document

Chapter - 2
Basics of Sound Structure
Author: Graeme C. Simsion and Graham C. Witt
Copyright: ©2005 by Elsevier Inc. All rights reserved.
1
Drug Expenditure Survey
Form
Hospital
Number:
Hospital
Categor y:
Operation
Name:
Surgeon
Number:
Drug Cod e
H17
P
Hospital
Name:
St VincentÕs
Contact at
Hospital:
Operation
Code:
Operation
Number:
48
Fred Fleming
Procedur e
Heart Transp lant
7A
Group:
Transpla nt
Surgeon
Total Drug
S15 Specialty:
Cardiology
Cost:
$75.50
Full Name Manufacturer
Method
Cost of
Numbe r
of Drug
of Admin. Dose ($) of Doses
MAX 150mg
Maxicillin
ABC Pharmaceuticals
ORAL
$3.50
15
MIN 500mg
Minicillin
Silver Bullet Drug Co.
IV
$1.00
20
MIN 250mg
Minicillin
Silver Bullet Drug Co.
ORAL
$0.30
10
Copyright: ©2005 by Elsevier Inc. All rights reserved.
2
Notes (1)
Each hospital has a unique hospital number to
distinguish it from other hospitals (in some
cases two hospitals had the same name) and
were prefixed “H” (for “hospital”).
Operation numbers were assigned sequentially
by each hospital.
Hospitals fell into three categories: “T” for
“teaching”, “P” for “public”, and “V” for
“private”. All teaching hospitals were public
(“T” implied “P”).
Copyright: ©2005 by Elsevier Inc. All rights reserved.
3
Notes (2)
The operation code was a standard international
code for the named operation. Procedure group
was a broader classification.
The surgeon number was allocated by individual
hospitals to allow surgeons to retain a degree of
anonymity. The prefix “S” stood for “surgeon”. Only
a single surgeon number was recorded for each
operation.
Total drug cost was the total cost of all drug doses
for the operation. The bottom of the form recorded
the individual antibiotic drugs used in the
operation. A drug code was made up of a short
name for the drug plus the size of the dose.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
4
Automated Survey Database:
Initial Drug Expenditure Model
OPERATION (Hospital Number, Operation Number, Hospital
Name, Hospital Category, Contact Person, Operation Name,
Operation Code, Procedure Group, Surgeon Number,
Surgeon Specialty, Total Drug Cost,
Drug Code 1, Drug Name 1, Manufacturer 1, Method of
Administration 1, Dose Cost 1, Number of Doses 1,
Drug Code 2, Drug Name 2, Manufacturer 2, Method of
Administration 2, Dose Cost 2, Number of Doses 2,
Drug Code 3, Drug Name 3, Manufacturer 3, Method of
Administration 3, Dose Cost 3, Number of Doses 3,
Drug Code 4, Drug Name 4, Manufacturer 4, Method of
Administration 4, Dose Cost 4, Number of Doses 4)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
5
Notes about the database
• We want to avoid redundant data (avoids
errors)
• The database is based on tables consisting of
rows and columns
• Each cell in the table can only contain one
fact
• No order can be assumed in the table
• We need to identify each row in the table (so
that we can get at each row later)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
6
One Fact per Column (1)
• The Drug Code column holds both a short
name for the drug and a dosage size, two
distinct facts.
– Eg. “MAX 150mg”
• The dosage size (150mg) in turn consists of a
numeric size (150) and a unit of measure
(mg).
– The three facts should be recorded in separate
columns.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
7
One Fact per Column (2)
• A more subtle example of a multi-fact column
is the Hospital Category.
– Public or private (first fact) and
– Does the hospital provides teaching (second fact).
• We should establish two columns, Hospital
Type and Teaching Status, to capture these
distinct ideas.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
8
Hidden Data
• Have we lost any data in the translation to
tabular form? We cannot assume that data
are stored in the order entered (there is no
assumed order). Sequence numbers help
here.
• If the hospitals used red forms for emergency
operations and blue forms for elective
surgery, we would need to add a column to
record the category if it was of interest to the
database users.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
9
Derivable Data
• To achieve the objective of non-redundancy we
should remove any data that can be derived
from other data in the table
– The Total Drug Cost = Total of Dose Costs multiplied
by the Numbers of Doses.
– Assuming we need the total, we remove Total Drug
Cost noting in supporting documentation how it can
be derived.
– Prefixing Hospital Numbers with “H” and Surgeon
Numbers with “S” adds no information to the
database (because column names provide context).
– Include information (such as “H” and “S”) in cases
where the context is lost.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
10
Why did the form include
derivable data?
• Occasionally, there may is a regulatory
requirement to hold derivable data.
• In some cases, derived data is included
unknowingly.
• Most often, it is added with the intention of
improving performance.
– But, performance is not our concern at the logical
modeling stage.
– Specifying redundant data in the physical model is
one option if the physical database performance is
not adequate.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
11
Determining the Primary Key
• We cannot automatically identify each row in the table.
• The primary key uniquely identifies one row of the table and is a
combination of columns that achieves this.
– A combination of Hospital Number and Operation Number will do
the job.
– If we nominate a particular hospital number and operation
number, there will be at most one row with that particular
combination of values.
• The purpose of the primary key is exactly is to enable us to refer
unambiguously to a specific row of a table.
– We can check this with the business experts by asking: “Could
there ever be more than one form with the same combination of
Hospital Number and Operation Number?”
Copyright: ©2005 by Elsevier Inc. All rights reserved.
12
Where are we now?
• OPERATION (Hospital Number, Operation Number, Hospital Name,
Hospital Type, Teaching Status, Contact Person, Operation Name,
Operation Code, Procedure Group, Surgeon Number, Surgeon
Specialty,
Drug Short Name 1, Drug Name 1, Manufacturer 1, Size of Dose 1,
Unit of Measure 1, Method of Administration 1, Dose Cost 1,
Number of Doses 1,
Drug Short Name 2, Drug Name 2, Manufacturer 2, Size of Dose 2,
Unit of Measure 2, Method of Administration 2, Dose Cost 2,
Number of Doses 2,
Drug Short Name 3, Drug Name 3, Manufacturer 3, Size of Dose 3,
Unit of Measure 3, Method of Administration 3, Dose Cost 3,
Number of Doses 3,
Drug Short Name 4, Drug Name 4, Manufacturer 4, Size of Dose 4,
Unit of Measure 4, Method of Administration 4, Dose Cost 4,
Number of Doses 4)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
13
Major problem: Drug
administration
• Max. 4 drugs per operation - is this a
problem?
– Why four? Can more the four be administered?
What ‘rule’ led to this? Could this change?
– Paper-based Form allowed for extra drugs to be
attached. How do we do this in the database?
– We cannot use this data easily in statistics: how
much money is spend on Ampicillin annually? We
would need complex program logic or another
table.
• We’ve handled drugs in this example using
what is called a ‘repeating group’. These are
generally bad.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
14
Recognizing Repeating
Groups
• OPERATION (Hospital Number, Operation
Number, Hospital Name, Hospital Category,
Teaching Status, Contact Person, Operation
Name, Operation Code, Procedure Group,
Surgeon Number, Surgeon Specialty,
{Drug Short Name, Drug Name,
Manufacturer, Size of Dose, Unit of Measure,
Method of Administration, Dose Cost, Number
of Doses})
Copyright: ©2005 by Elsevier Inc. All rights reserved.
15
Are there any others?
• Seek information about other data on
the form
– Can more than one surgeon be involved in
an operation?
If so, the data items describing surgeons
(Surgeon Number and Surgeon Specialty)
would become another repeating group.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
16
Removing Repeating Groups
• Split the original table into multiple tables (one
for the basic data and one for each repeating
group) as follows:
– Remove each separate set of repeating group
columns to new tables so that each occurrence of the
group becomes a row in its new table.
– Include the key of the original table in each new table
– If the sequence of occurrences within a repeating
group has business significance, introduce a
“Sequence” column
– Name each new table.
– Identify and underline the primary key of each new
table
Copyright: ©2005 by Elsevier Inc. All rights reserved.
17
The Database after Removing
Repeating Groups
• OPERATION (Hospital Number, Operation
Number, Hospital Name, Hospital Type,
Teaching Status, Contact Person, Operation
Name, Operation Code, Procedure Group,
Surgeon Number, Surgeon Specialty)
• DRUG ADMINISTRATION (Hospital Number,
Operation Number, Drug Short Name, Size of
Dose, Unit of Measure, Method of
Administration, Dose Cost, Number of Doses,
Drug Name, Manufacturer)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
18
Determining the Primary Key
of the New Table
• “What is the minimum combination of columns needed to
uniquely identify one row (i.e., one specific administration of a
drug)?”
• To identify the individual administration we need the Drug Short
Name, the Size of Dose, Unit of Measure, and Method of
Administration—a six-column primary key.
• To verify we ask: “Can the same drug be administered in
different dosages for the one operation?” (yes) and “Can the
same drug and dose be administered using different methods
for the one operation?” (yes, again).
• The primary key of the Operation table is in the Drug
Administration table so we know which operation
each drug administration applies to.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
19
More Problems Of
Redundancy
• Every row operation in hospital number 17 will contain the facts
that the hospital’s name is St. Vincent’s, that Fred Fleming is the
contact person, that its teaching status is T, and that its type is P.
• Oour criterion of non-redundancy is not being met. This has
consequences:
– Changing any fact about a hospital will involve updating every
operation for that hospital (update anomaly)
– If we were to delete the last operation for a hospital, we would also
be deleting the basic details of that hospital (deletion anomaly)
• Fix by removing Hospital (and other examples such as this) to a
separate table. Keep the key of the new table as a link (foreign
key) to the Operation table.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
20
Next Version of our Database
• OPERATION (Hospital Number, Operation Number,
Operation Name, Operation Code, Procedure Group,
Surgeon Number, Surgeon Specialty)
• HOSPITAL (Hospital Number, Hospital Name,
Hospital Type, Teaching Status, Contact Person)
• DRUG ADMINISTRATION (Hospital Number,
Operation Number, Drug Short Name, Size of Dose,
Unit of Measure, Method of Administration, Dose
Cost, Number of Doses, Drug Name, Manufacturer)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
21
Other Similar Cases
• Surgeons are given numbers in each hospital but
data will be lost if the last operation in the database
that Surgeon S11 has performed is deleted we lose
information about S11.
• Operation Name and Procedure group will also be
lost
• There seems to be rich data about specific drusg that
also needs to be sorted out
– Drug Short Name clearly gives Drug Name and
Manufacturer
– It is not clear about costings of drugs administration. We
need the help of our client
Copyright: ©2005 by Elsevier Inc. All rights reserved.
22
Asking Our Client
• Modeler: What determines Dose Cost?
• Client: It depends on the drug itself and the size of the dose.
• Modeler: So any two doses of the same drug and same size would
always cost the same?
• Client : Assuming, of course, they were administered by the same
method; injections cost more than pills.
• Modeler: But wouldn’t cost vary from hospital to hospital (and operation
to operation)?
• Client : Strictly speaking, that’s true, but it’s not what we’re interested in.
We want to be able to compare prescribing practices, not how good
each hospital is at negotiating discounts. So we use a standardized cost.
• Modeler: So maybe we could call this column “Standard Dose Cost”
rather than “Dose Cost”. By the way, where does the standard cost
come from?
Copyright: ©2005 by Elsevier Inc. All rights reserved.
23
The Role of Business Rules
• Note that if the business rules were different, the
results might well be different. For example,
consider the rule “We use a standardized cost.”
If this did not apply, the Dose Cost would also be
determined by Hospital Number as well as the
other data items identified.
• Could it be that a surgeon has more than one
speciality (in which case we would have a
repeating group called Surgeon Speciality) We
assume this is not the case here.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
24
The Final Database Model
• OPERATION (Hospital Number, Operation Number, Operation
Code, Surgeon Number)
• SURGEON (Hospital Number, Surgeon Number, Surgeon
Specialty)
• OPERATION TYPE (Operation Code, Operation Name, Procedure
Group)
• STANDARD DRUG DOSAGE (Drug Short Name, Size of Dose,
Unit of Measure, Method of Administration, Standard Dose Cost)
• DRUG (Drug Short Name, Drug Name, Manufacturer)
• HOSPITAL (Hospital Number, Hospital Name, Hospital Type,
Teaching Status, Contact Person)
• DRUG ADMINISTRATION (Hospital Number, Operation Number,
Drug Short Name, Size of Dose, Unit of Measure, Method of
Administration, Number of Doses)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
25