11a_dbimpl - University of Central Florida

Download Report

Transcript 11a_dbimpl - University of Central Florida

IMS 4212: Database Implementation
Physical Database Implementation—Topics
•
•
•
•
Reasons for concern
Data Volume Analysis
Data Usage Analysis
Index Design
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
1
IMS 4212: Database Implementation
Reasons for Concern
• Write the SQL query to calculate your GPA
COURSE
SECTION
DeptCode
CourseNo
Name
CreditHrs
LabHrs
SectionID
DeptCode <AK>
CourseNo <AK>
SecNo
<AK>
Term
<AK>
Year
<AK>
Room <FK1>
Days
Time
InstructorID <FK2>
Has
ENROLLMENT
Has
SectionID <FK1>
StudentID <FK2>
Grade
<FK3>
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
STUDENT
Has
GRADE
Grade
GradePts
Has
:
StudentID
LastName
FirstName
:
2
IMS 4212: Database Implementation
Reasons for Concern (cont.)
• Normalization spreads data around the database
– Optimizes storage space needed (usually)
– Protects against anomalies
• Normalization may make retrieval inefficient
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
3
IMS 4212: Database Implementation
Reasons for Concern (cont.)
• Each retrieval efficiency enhancement introduces some
level of inefficiency with respect to a fully normalized
database
• We don’t introduce retrieval enhancements with wild
abandon
– Analyze the nature of the problem
– Know capabilities of DBMS & Operating System
– Select carefully from available options
• Must understand tradeoffs (advantages and
disadvantages) of each retrieval enhancement
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
4
IMS 4212: Database Implementation
How Computers Work
• Data flows through different parts of the computer as
application instructions are executed
Memory
CPU
ALU
Unused
Instruction
Data
Application Data
Data
Value 1
Application
Value 2
Operating
System
Results
Table Pages
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
5
IMS 4212: Database Implementation
SQL Server Data Storage
• Data in tables is stored on pages and there are eight
pages per extent.
RowID1 Data for Row 1
RowID2 Data for Row 2
• When more
RowID3 Data for Row 3
RowID4 Data for Row 4
space is needed
RowID5 Data for Row 5
an entire extent
RowID6 Data for Row 6
RowID7 Data for Row 7
is added to the
RowID8 Data for Row 8
database
RowID9 Data for Row 9
RowID10 Data for Row 10
• Each row (record) in the database
RowID11 Data for Row 11
RowID12 Data for Row 12
is physically stored on a page
RowID13 Data for Row 13
and in an extent
• Each row has a RowID that identifies it and it’s
location in the page
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
6
IMS 4212: Database Implementation
SQL Server Data Storage (cont.)
• Without a clustered index
(covered soon) rows are
added to pages in the order
of insertion.
• When pages are full rows
are added to the next page in the extent.
• When extents are full new extents are created
• Tables keep track of the
sequence of extents that
contain their contents
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
7
IMS 4212: Database Implementation
Data Retrieval
• By default, queries of tables require that each page be
loaded into memory in sequence and each row
examined to see if it meets the query conditions
This is a full table scan
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
8
IMS 4212: Database Implementation
Data Retrieval (cont.)
• The Page is the basic unit of IO
– Entire page is moved from physical storage to RAM for
evaluation
• In a pure table scan (the default method of retrieval)
each record is examined to see if it matches the
WHERE clause conditions (if any)
– Test value and column value moved to CPU for testing
– Records where condition is TRUE are added to result
set
• Pages are cached and the cached copy will be read if
available and needed
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
9
IMS 4212: Database Implementation
Data Retrieval (cont.)
• In SQL Server page sizes are fixed at 8 KB
– (Entire extent is 64 KB)
– Some DBMS have different sizes
– Some DBMS allow tuning on a table by table basis
– 8 KB is also the maximum record size
• Number of Records on a page depends on record size
– Sum of data sizes of each column
• IO time for a pure scan increases with
– Number of records
– Record size
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
10
IMS 4212: Database Implementation
Data Retrieval Costs
• Two levels of costs associated with data retrieval
– Most Important: IO moving page from disk storage to
RAM
– Less Important: CPU effort to evaluate records
– In default mode records cannot be evaluated until they
have been moved into RAM
• We also care about physical storage space
– Less important as a performance issue
• We also care about costs of reorganizing data as it is
added to the DB or updated (later)
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
11
IMS 4212: Database Implementation
Data Retrieval Costs (cont.)
• ALL Retrieval Enhancement mechanisms must be
evaluated on the dimensions from the previous slide
• None of the enhancements come without cost
• Decisions affected by use of the data, not just pure
database characteristics
– Understanding organizational tasks and priorities key
– Requires balance between technical and organizational
knowledge
– MIS graduates ideally positioned to participate in this
analysis
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
12
IMS 4212: Database Implementation
Data Retrieval Costs (cont.)
• Degree of the cost changes with many factors
– Table sizes
– Access mechanisms (paths—more later)
– Nature of query
– Number of tables needed in query
– Nature of the enhancement approach
• Remember that our DB design goal of minimizing
storage space and redundancy (normalization) spread
data around the database
– More tables containing transaction logic
– More complicated queries
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
13
IMS 4212: Database Implementation
Data Volume Analysis
• We don’t have retrieval problems with small tables
• Need to know how big a table will get over the life of
the system to understand the potential magnitude of
the problem
• Q: How many records are expected in the
ENROLLMENT table?
• Document in the data
dictionary
– Estimate of number of
records expected
– How estimate was computed
COURSE
SECTION
DeptCode
CourseNo
Name
CreditHrs
LabHrs
SectionID
DeptCode <AK>
CourseNo <AK>
SecNo
<AK>
Term
<AK>
Year
<AK>
Room <FK1>
Days
Time
InstructorID <FK2>
Has
ENROLLMENT
Has
SectionID <FK1>
StudentID <FK2>
Grade
<FK3>
GRADE
Grade
GradePts
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
STUDENT
Has
:
StudentID
LastName
FirstName
:
14
Has
IMS 4212: Database Implementation
Data Volume Analysis (cont.)
• Estimating DV
– Absolute count: We know there are 12 possible grades
that can be contained in the GRADE table
– Estimate: We think that we will have 32,000 students
next year (use your statistics!)
– Derived: Each enrolled student takes an average of four
sections per semester and there are an average of 35
students per section
– Historical trends: Enrollment is growing at 2% per year
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
15
IMS 4212: Database Implementation
Data Volume Analysis (cont.)
• Don’t forget historical data!
– Are graduated or withdrawn student records retained in
the STUDENT and ENROLLMENT tables?
– How long will they be kept?
– What is the potential size of the ENROLLMENT table
if records are never discarded?
• Precise entity definitions are critical in DVA
• Document where or how you came up with volume
estimates
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
16
IMS 4212: Database Implementation
Data Usage Analysis
• DUA is concerned with three factors
– How frequently are tables accessed?
– How urgent are the table accesses?
– What is the access path into the table?
• Usually means what fields are being compared in a
WHERE clause
• Fields in a JOIN expression
• Goal is to find the high frequency, important retrievals
and to put enhancements on the path used by the
retrieval
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
17
IMS 4212: Database Implementation
Data Usage Analysis (cont.)
• Many frequency and urgency estimates will come from
an analysis of the organization’s business practices and
needs
– What is max time a customer can be allowed to wait for
a response?
– How many sales take place a day?
– Can this transaction take place in batch overnight?
• How many sales are made per hour? Do we expect it to
grow?
• Consider electronic credit card clearing from retail
stores
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
18
IMS 4212: Database Implementation
Data Usage Analysis (cont.)
• The access path is the fields being searched to find
appropriate records in a transaction
• What is the path taken through the sample ERD to:
– Calculate your GPA?
– Determine if you have met a course prerequisite?
• Don’t forget checks of operational business rules made
in conjunction with a transaction
– What if we had a business rule that said only students
with a 3.0 GPA could take ISM 4212?
– How about checking prerequisites?
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
19