IMS 6217: Database Performance Part 1

Download Report

Transcript IMS 6217: Database Performance Part 1

IMS 6217: Database Performance Part 1
Database Performance Part 1—Topics
• Storing Data (Review)
• Measuring Problems
• Partitioning
– Horizontal
– Vertical
• Denormalization
• Hardware Improvements
– Faster !!
– RAID
– Parallel Processing
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
1
IMS 6217: Database Performance Part 1
Default SQL Server Data Storage
• Data in tables is stored on pages and there are eight
pages per extent.
RowID1 Data for Row 1
RowID2 Data for Row 2
• When more
RowID3 Data for Row 3
RowID4 Data for Row 4
space is needed
RowID5 Data for Row 5
an entire extent
RowID6 Data for Row 6
RowID7 Data for Row 7
is added to the
RowID8 Data for Row 8
database
RowID9 Data for Row 9
RowID10 Data for Row 10
• Each row (record) in the database
RowID11 Data for Row 11
RowID12 Data for Row 12
is physically stored on a page
RowID13 Data for Row 13
and in an extent
• Each row has a RowID and PageOffset that identifies it
and it’s location in the page
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
2
IMS 6217: Database Performance Part 1
Data Retrieval (cont.)
• The Page is the basic unit of IO
– Entire page is moved from physical storage to RAM for
evaluation
• In a pure table scan (the default method of retrieval)
each record is examined to see if it matches the
WHERE clause conditions (if any)
– Test value and column value moved to CPU for testing
– Records where condition is TRUE are added to result
set
• Pages are cached and the cached copy will be read if
available and needed
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
3
IMS 6217: Database Performance Part 1
Measuring Problems
• Many DBMS provide performance management &
measurement tools
– Individual process costs
– Monitoring and statistical reporting
• Use tools to look for potential or actual performance
bottlenecks
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
4
IMS 6217: Database Performance Part 1
Measuring Query Performance
• Four tools for measuring query performance
– Estimated Execution Plan
• Run before query is executed to estimate allocation
of query effort
– Include Actual Execution Plan
• Reports actual allocation of query effort
– Include Client Statistics
– Database Tuning Advisor
– SQL Server Profiler
• Look for similar capabilities in other DBMS
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
5
IMS 6217: Database Performance Part 1
Execution Plans
SELECT CompanyName, ProductName
FROM Suppliers INNER JOIN Products
ON Suppliers.SupplierID =
Products.SupplierID
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
6
IMS 6217: Database Performance Part 1
Execution Plans (cont.)
• Plans can be incredibly complex based on query
structure
• Look for high cost parts of
the plan with explanations
– Large tables
– Forgotten index
• Use high cost elements to focus remedial measures
• Remember that costs are relative to this query only
– A high cost part of a low cost query probably does not
need attention
– Study as table sizes grow!!
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
7
IMS 6217: Database Performance Part 1
Include Client Statistics
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
8
IMS 6217: Database Performance Part 1
Database Tuning Advisor
• DTA is a tool that monitors execution of all or selected
events on all or selected databases/tables
• Produces recommendations for changes in structure
– Indexes
• Clustered
• Nonclustered
– Indexed views
– Partitions
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
9
IMS 6217: Database Performance Part 1
SQL Server Profiler
• SQL Server Profiler logs specified activities
• Allows post execution analysis looking for high demand
actions
• Can generate a script that can be replayed on a test
server for 'real time' monitoring
– Events leading up to a failure
– Events leading up to slow performance
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
10
IMS 6217: Database Performance Part 1
Remedial Actions
• Indexing (Previous Lecture)
• Table Partitioning
– Horizontal
– Vertical
• Denormalizing
• Hardware Improvements
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
11
IMS 6217: Database Performance Part 1
Partitioning
• Partitioning entities divides one table into many
– Horizontal partitioning
• Each table has all fields from the original table
• Each table has a subset of records
– Vertical partitioning
• Each table has the PK of the original table
• Each table has all records
• Each table has a subset of fields
– May partition both vertically and horizontally
• Very powerful technique with historical data
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
12
IMS 6217: Database Performance Part 1
Partitioning (cont.)
COURSE
SECTION
DeptCode
CourseNo
Name
CreditHrs
LabHrs
SectionID
DeptCode <AK>
CourseNo <AK>
SecNo
<AK>
Term
<AK>
Year
<AK>
Room <FK1>
Days
Time
InstructorID <FK2>
Has
ENROLLMENT
Has
SectionID <FK1>
StudentID <FK2>
Grade
<FK3>
Has
STUDENT
Has
• Horizontal Partitioning
– How many records in the
STUDENT table?
– How many of them are currently enrolled?
– How frequently do we need to access both current and
former students in the same query or operation?
– It may make sense to partition tables based on a
historical context
• Active records
• Archived records
GRADE
Grade
GradePts
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
:
StudentID
LastName
FirstName
:
13
IMS 6217: Database Performance Part 1
Horizontal Partitioning—Historical Context
• Significant business data is not frequently used for
current operations
• But legal, accounting, and research needs argue against
deleting it
• Consider moving historical data to
– Separate tables in the same DB
– Separate DB (on different servers)
– Archived storage (requires reload for use)
• Trading off ease of access to old data with improved
performance with current data
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
14
IMS 6217: Database Performance Part 1
Horizontal Partitioning
• May also partition to distribute workload across
servers
– Geographically distributed DB
– Arbitrary distribution based on key value
• Reduces table sizes (rows) on any server
• Any partitioned tables can be reassembled with a
UNION query
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
15
IMS 6217: Database Performance Part 1
Horizontal Partitioning—Union Query
• Assume two Student tables with same columns:
– CurrentStudents
– HistoricalStudents
• Write the SQL to create mailing labels to send mailouts
to all students
• Can you arrange the results to be sorted by zip code
order for bulk rate mail processing?
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
16
IMS 6217: Database Performance Part 1
Vertical Partitioning
• How many fields does the Student entity have?
– Librarian, Registrar, Athletic Department, and Health
Center may all need a different subset of fields from the
STUDENT entity
– Can you think of high frequency events that do not
require the student's address?
– It may make sense to create separate tables containing
the necessary attributes for specialized views of the data
– Common PK creates 1:1 cardinality between all tables
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
17
IMS 6217: Database Performance Part 1
Vertical Partitioning (cont.)
STUDENT
E_MAIL
STUDENT
HEALTHREC
STUDENT
• Vertical partitioning
reduces record size so
more records per page read
• Whole logical record can be
assembled using SQL when needed
• We are actually backing into a
supertype/subtype relationship
– Except that each table has the same number
of records
• Increases maintenance and some retrieval costs
• Horizontal and Vertical partitioning can be combined
StudentID
E_Mail
StudentID
LastName
FirstName
MiddleInitial
StudentID
BloodType
Height
Weight
STUDENT
ADDRESS
StudentID
HomeStreet
HomeCity
HomeState
HomeCountry
HomeZip
LocalStreet
LocalCity
LocalState
LocalCountry
LocalZip
LocalPhone
E_Mail
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
18
IMS 6217: Database Performance Part 1
Denormalization
• Denormalizing is the process of reshuffling attributes
and sometimes entities to create entities that violate the
rules of normalization
• We are trading off (again) storage efficiency and
anomaly avoidance for better retrieval efficiency
• Denormalizing includes:
– Storing derived attributes explicitly
– Allowing transitive dependencies (violating second,
third, or Boyce-Codd normal form)
– Merging entities in 1:1 relationships
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
19
IMS 6217: Database Performance Part 1
Denormalization—Derived Attributes
STUDENT
StudentID
LastName
FirstName
TotCrHr
TotGP
• Storing derived attributes is a common means
of improving processing efficiency
• What transactions require access to GPA?
• How many tables/row examinations are avoided by
storing total grade points and total credit hours with
the STUDENT entity?
• How much impact does storing this data have on
Student row size?
• Explicitly storing derived attributes gives rise to new
operational business rules to enforce accuracy
– How many events change the derived attributes?
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
20
IMS 6217: Database Performance Part 1
Denormalization (cont.)
• 1:1 Relationships
– It may be possible to collapse data from one entity in a
1:1 relationship into the other.
OFFICE
– Usually the pervasive entity survives
Building
Room
Window
– Alternately, both entities may be retained
Length
Width
but the data from one may be copied into
LastPaint
FacultyID <FK>
the other to avoid a table look-up
LastName
PROFESSOR
FacultyID
LastName
FirstName
Phone
Dept. <FK>
OFFICE
Has
Building
Room
Window
Length
Width
LastPaint
FacultyID <FK>
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
FirstName
Phone
Dept
21
IMS 6217: Database Performance Part 1
Denormalizing (cont.)
• 1:M Relationships
– You may consider moving or duplicating attributes
from the “one” side of a 1:M relationship into the
“many” side
– This will result in considerable data duplication
– Considerations
• There should be many records on the “one” side
• Frequent access should be directly into the “many”
side
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
22
IMS 6217: Database Performance Part 1
Denormalization (cont.)
• 1:M Relationships (cont.)
ENROLLMENT
SectionID <FK1>
StudentID <FK2>
Grade
<FK3>
LastName
FirstName
STUDENT
Has
StudentID
LastName
FirstName
:
• Similar technique may be used by collapsing or copying
attributes into the associative entity between two
entities in a M:M relationship
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
23
IMS 6217: Database Performance Part 1
Denormalization (cont.)
• The goal of denormalizing is to avoid accessing a
(large) table for high frequency critical transactions
• Denormalizing usually requires additional business
rules to guarantee that data remains accurate in the
face of updates
• Trade offs also include increasing the record length
(reducing rows per page) for one or more tables
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
24
IMS 6217: Database Performance Part 1
Hardware Improvements
• Faster Hardware
– CPU, Disk drives, network, bus
• Splitting DB onto different disk devices
• RAID storage
• Parallel Processing
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
25
IMS 6217: Database Performance Part 1
Faster Hardware
• Any transaction's run time is constrained by one or
more characteristics of the system's hardware
• Increasing the speed of the constraining devices will
improve throughput
• Increasing speed of unconstrained devices will have
little effect
• Increasing capacity of some components will improve
performance
– Improving cache memory will increase the likelihood
that a desired page is cached in memory
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
26
IMS 6217: Database Performance Part 1
RAID Storage Devices
• RAID is for Redundant Array of Inexpensive Disks
– Multiple disks appear as a single logical drive to the
computer
– May be implemented in hardware or software (OS)
• Various RAID levels provide for different levels of
performance and redundancy
• Some RAID levels enable the rebuilding of entire lost
physical drives through parity storage
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
27
IMS 6217: Database Performance Part 1
*
RAID Storage Devices—Raid 3
• Records are striped across
multiple physical devices
– Part of each record is laid down
across multiple physical drives
– Much faster Read/Write time since disk rotation needed
to read whole record/block is much shorter
– However only one request can be serviced concurrently
– Not commonly used in practice
• A single parity disk allows reconstruction of data on
damaged drives
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
* Image source: Wikipedia
28
IMS 6217: Database Performance Part 1
*
RAID Storage Devices—Raid 4
• Blocks are stored independently
on the drives
– Block A1 can be serviced just
by Drive 0
– Simultaneous requests for Blocks B2 or D3 can also be
serviced
• A single parity drive enables recovery of lost data
• Write operations may be slower—simultaneous write
operations to Drives 0-2 must wait on the parity
calculation and writing on Drive 3
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
* Image source: Wikipedia
29
IMS 6217: Database Performance Part 1
*
RAID Storage Devices—Raid 5
• Similar to Raid 4 except that
parity storage is distributed
across multiple drives
– Rotating allocation
– Lessens the chance that writes on two drives will wait
on parity updates on a single parity drive
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
* Image source: Wikipedia
30
IMS 6217: Database Performance Part 1
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
31
P2
P3
One Processor
• More and more computers support parallel processing
(multiple CPUs on the same computer)
1
...
• Some tasks can be split among multiple
2
...
processors
3
...
• In an SQL SELECT query the usual
4
...
method requires the RDBMS to scan
5
...
each record to determine if it matches
6
...
the WHERE clause or JOIN criteria
7
...
8
...
• In parallel processing part of the
9
...
whole table is passed to each processor
• Availability depends on hardware, OS, and RDBMS
• May have licensing implications
P1
Parallel Processing