Physical Database Design

Download Report

Transcript Physical Database Design

IT 20303
• The Relational DBMS
• Section 06
Relational Database Theory
• Physical Database Design
Relational Database Theory
• Physical Database Design
– Goals
• Improve performance
–By minimizing disk I/O
• Improving management of the
data
–By grouping tables that can be
managed as a group
Relational Database Theory
• Physical design decisions are based
on:
– Use of the data (volume, frequency)
– Features supported by the specific
RDBMS
– Disk storage configuration
Relational Database Theory
• DBA initially sets up the physical
database
– Tunes physical parameters on a
ongoing basis
• As usage patterns change
• As new hardware/software options
become available
Relational Database Theory
• Steps in Physical Design Process
– Determine which tables can be managed
as a group
• Many RDBMSs support the concept of
a Container (Oracle Tablespace, db
space, Access uses the .mdb)
–A collection of tables, and indexes
Relational Database Theory
– Develop a plan for allocating tables to
disk devices
• Consider parallel disk controllers
• Group tables together that are
frequently joined
• Distribute heavily accessed table to
different disk devices
–To avoid excessive head movement
on one disk
Relational Database Theory
– Build indexes on table columns,
based on frequency of use
– Restructure tables if necessary
• Fragment large tables into
multiple smaller ones
• De-normalize tables if appropriate
Relational Database Theory
Tablespace
• Example of a
Container
Table 1
Table N
Table 2
OS File
Relational Database Theory
• Managing a collection of Tables, Indexes
– Purpose of container concept
• Relate tables, indexes to physical disk
files
• Aid in the management of the
database
–Example: A tablespace can be
taken offline, backed up, and
restored while the remainder of the
database is online
Relational Database Theory
– Support clustering data from related
tables in the same file
• So that related data is read with
the same I/O request
Relational Database Theory
• How the RDBMS processes a user
request
– RDBMS parses, validates, and
optimizes the SQL request
– Determines disk file in which the
table is written
• Specific to each RDBMS & OS
Relational Database Theory
– Initiates I/O request to operating
system, if necessary
• I/O is requested if file is not
currently in buffers
– Processes execution plan using data
in its buffer
Relational Database Theory
• Indexes
– Index is a separate structure (table)
• Points into the data table
• Built on one or more columns in
the data table
Relational Database Theory
• Comments on Indexing
– An index can be built on any column or
combination of columns
– An index can be unique or non-unique
– An index on the primary key is called the
primary index
– Most RDBMSs use an internal row id as
the pointer to the row
– Use of the index is transparent to the user
Relational Database Theory
• Use of an index
– Provides access to a row based on
data value(s)
– Avoids duplicates – only way
– Supports sequential processing on
the indexed field
– Improves performance
Relational Database Theory
• Use of an index improves performance on
Retrieval
– Processing an index is more efficient than
processing a table – for reads
• Index is usually small, relative to the
table
–Can be held entirely in memory
• The smaller the index value, the more
entries per block the more likely the
index will be in memory
Relational Database Theory
• Most RDBMSs use a type of B-Tree
Index
– B-tree indexes were designed for
efficient search of a sorted list
– Algorithms exist for managing and
maintaining B-trees
Relational Database Theory
• B-trees were introduced by Bayer
(1972) and McCreight.
– They are a special m-ary balanced
tree used in databases because
their structure allows records to be
inserted, deleted, and retrieved with
guaranteed worst-case performance
Relational Database Theory
• B-Tree
Relational Database Theory
• Use of index degrades performance on
Updates
– Inserting a row is the source of
much disk I/O (overhead)
• Every index on the table must be
searched and updated also
Relational Database Theory
• Frequently inserting rows leads to
index block overflow
– Causes much disk I/O as overflow
condition is processed
Relational Database Theory
• Techniques for managing volatile
tables (many interests, deletes)
– Partially fill index blocks when
creating the index
– Periodically restructure (Drop,
Create) the indexes
Relational Database Theory
• Indexing: Strengths and Weaknesses
– Strengths
• Improves performance on retrieval of
data
• Can be built or dropped at any time
• Usage is transparent to the user
– Weaknesses
• Degrades update performance
Relational Database Theory
• De-normalization
– De-normalization means combining two
(or more) tables
• Usually done when tables are
frequently joined
– De-normalization (joining two tables)
depends on usage
• Depends on how applications and
users access the data
Relational Database Theory
• De-normalization is done to improve
performance
– Tailors data structures for one
specific application’s use
– Improves performance of one type
of access at expense of others
Relational Database Theory
• De-normalization Trade-Offs
Normalization
De-normalization
Eliminates update anomalies
Improves performance for
specific application(s)
Minimizes data redundancy
Supports simpler logic
Provides applicationindependent database design
Encourages sharing of data
Relational Database Theory
• When to De-Normalize
– This is EVIL, Do Not Do…
– When does de-normalization have
minimal impact?
• Data is accessed primarily on a
read-only basis
• Data is accessed primarily by one
application
Relational Database Theory
• When to de-normalize
– After database design is done and
tables are normalized to 3NF
– After clustering related tables in the
same logical container
– After considering trade-offs and
usage of data
Relational Database Theory
• Alternatives to de-normalization
– Physical placement of data
• Use of container
• Can improve performance without
impacting logical design
– Selective hardware upgrades
• More main memory, expanded
storage, cache storage devices
Relational Database Theory
• Fragmentation – Better alternative to denormalization
– Means breaking one table into two (or
more) tables
• Usually done when one table is very
large
• Or groups of user almost exclusively
access a subset of data in a table
Relational Database Theory
• Fragmentation can be based on
selection or projection
– Must be able to reconstruct the
original table – by union or join
– Primary key column(s) must be
included in all vertical fragments
• Disadvantage is that the DBA must be
aware of all the fragmented tables
Relational Database Theory
• Physical Design Review
Relational Database Theory
• Physical Database Design
– Goals
• Improve performance
–By minimizing disk I/O
• Improving management of the
data
–By grouping tables that can be
managed as a group
Relational Database Theory
• Indexing: Strengths and Weaknesses
– Strengths
• Improves performance on retrieval of
data
• Can be built or dropped at any time
• Usage is transparent to the user
– Weaknesses
• Degrades update performance
Relational Database Theory
• Questions?