The Kimball Lifecycle - Doctor

Download Report

Transcript The Kimball Lifecycle - Doctor

Physical Database
Design
By Dr. Gabriel
The Process in a Nutshell
• setting up the database environment
• setting up appropriate security
• preliminary performance tuning strategies,
from indexing to partitioning and
aggregations.
• If appropriate, OLAP databases are also
designed during this process.
Physical database design
• Develop naming and database standards
• Develop physical database model
• Physical storage
– SAN vs direct storage
• Fault tolerance vs performance
• Create staging and star schema
(dimension and fact) tables
• Creating indexes
Physical database design
• Creating indexes
– B-tree indexes
• Clustered index
– Phone book
• Nonclustered index
– Book index
• For high cardinality columns
– Ex: order number, customer number, etc.
• Each page in an index B-tree is called an index node
• The top node of the B-tree is called the root node
• The bottom level of nodes in the index is called the leaf nodes.
• Any index levels between the root and the leaf nodes are collectively known
as intermediate levels.
• In a clustered index
– the leaf nodes contain the data pages of the underlying table.
• In a nonclustered index
– the leaf layer is made up of index pages instead of data pages.
• The root and leaf nodes contain index pages holding index rows.
• Each index row contains a key value and a pointer to either an intermediate
level page in the B-tree, or a row in the leaf level of the index.
• The pages in each level of the index are linked in a doubly-linked list.
Physical database designCreating Indexes (Cont)
Physical database designCreating Indexes (Cont)
– Bitmapped indexes
• Good for low cardinality columns
– Gender, Yes/No fields, etc.
• Is a string of bits for each possible value
– Each bit string has 1 bit for each row
– Bit is set to 1 if the row has the value the bit string
represents
» For gender column, 3 bit strings represent “M”, “F”,
and “?”
– Much smaller than B-tree
– Usually built on a single column
Physical database designCreating Indexes (Cont)
• For dimension tables
– Unique on surrogate keys
– Bitmapped indexes on the most frequently
queried dimension attributes or B-tree indexes
• For fact tables
– Unique on PK
• For date-constrained DW, use date field as first
one in the key
– If simultaneous use of multiple indexes on a
table in a query is supported, create a single
column index on each fact table key
Physical database designCreating Indexes (Cont)
• Drop indexes if load adds more than 1020% to the size of the table
• Recompute table statistics after loading,
and, perhaps, rebuild the indexes
– Auto recompute statistics option: Is it a good
idea?
Physical database design
• Creating partitions, if necessary
– Increases performance
– Eases maintenance
Aggregations
• Not too much, not too little
• Monitor queries and design aggregations to
match query patterns
• Generally, creating report-specific summary
tables is inefficient and impractical
• Need to be maintained
– Usually by ETL processes
– May be automatically maintained by using
materialized/indexed views or OLAP aggregations
• Simple “group by” queries will do the job
OLAP
• Major vendors built-in OLAP capabilities
into the DB engine
• Support data analysis
• Provides aggregation management
• Supports multiple dimensions
• May not be needed if DB provides
aggregate management and navigation
and BI tools support complex analysis
directly against the db
OLAP
• Market is not mature yet
– Products functionality and scalability varies
dramatically in contrast to relational database
products that have more similarity than
differences
– OLAP databases use languages that are not
universal
• Ex, multi-dimensional expressions (MDX)
– Not universal language like SQL
– Only OLAP that implement XML/A (XML for analysis)
standard support MDX
MS SQL Server-related
Physical Database
Considerations
Materialized/Indexed Views
• An indexed view is a view that has been
materialized.
– it has been computed and stored.
– You index a view by creating a unique
clustered index on it.
– dramatically improve the performance of
some types of queries.
– best for queries that aggregate many rows.
– not well-suited for underlying data sets that
are frequently updated.
Materialized/Indexed Views
CREATE TABLE wide_tbl(a int PRIMARY KEY, b int, ..., z
int)
CREATE VIEW v_abc WITH SCHEMABINDING AS
SELECT a, b, c
FROM wide_tbl
WHERE a BETWEEN 0 AND 1000
CREATE UNIQUE CLUSTERED INDEX i_abc ON
v_abc(a)
Partitioning
• The steps for creating a partitioned table
or index include the following:
– 1. Create a partition function to specify how a
table or index that uses the function can be
partitioned.
– 2. Create a partition scheme to specify the
placement of the partitions of a partition
function on filegroups.
– 3. Create a table or index using the partition
scheme.
Creating a Partition Function
• A partition function specifies how the table
or index is partitioned.
• The function maps the domain into a set of
partitions.
• To create a partition function, you specify the
number of partitions, the partitioning column,
and the range of partition column values for
each partition.
• Note that when you specify the partitioning
column, you can only specify one.
Creating a Partition Function
CREATE PARTITION FUNCTION
pf_exppeople (bigint)
AS RANGE FOR VALUES
(111111317320474, 111111621356435,
111111778050047, 111111981946587,
151670034507413, 252440002902502,
301111111111111, 401111111111111,
501111111111111 );
Partition Scheme
• A partition scheme maps the partitions
produced by a partition function to a set of
filegroups that you define.
• When you create a partition scheme, you
define the filegroups where the table
partitions are mapped, based on the
parameters of the partition function.
• You must specify enough filegroups to
hold the number of partitions.
Partition Scheme
Create PARTITION SCHEME ps_exppeople
AS PARTITION pf_exppeople
TO (fg_exp1, fg_exp2,
fg_exp3,fg_exp4,fg_exp5,fg_exp6,fg_exp7
,fg_exp8,fg_exp9,fg_exp10);
Create a table using the
partition scheme
CREATE TABLE person_data(
id bigint NOT NULL,
lname varchar(32) NULL,
fname varchar(32) NULL,
…,)
ON ps_exppeople(id)
Primary/Foreign Key Constraints
• Create Table Statement – General Format
CREATE TABLE table_name
(column1_name datatype default_clause null/not null
column_constraints,
column2_name datatype default_clause null/not null
column_constraints,
…..,
table_constraints);
• Primary Key Constraint
CONSTRAINT constraint_name PRIMARY KEY (Column names)
•Foreign Key Constraint
CONSTRAINT constraint_name FOREIGN KEY (FK column names)
REFERENCES parent_table_name (parent column names)
Primary/Foreign Key Constraints
• Example: Create table statement for the Shipments table
CREATE TABLE Shipments
( SNO CHAR(5) NOT NULL,
PNO CHAR(5) NOT NULL,
JNO CHAR(5) NOT NULL,
Shipmentdate DATE NOT NULL,
QTY INTEGER NOT NULL,
CONSTRAINT ShipmentPK PRIMARY KEY (SNO,PNO,JNO, Shipmentdate),
CONSTRAINT SupplierFK FOREIGN KEY (SNO) REFERENCES Suppliers (SNO)
CONSTRAINT PartFK FOREIGN KEY (PNO) REFERENCES Parts (PNO)
CONSTRAINT JobFK FOREIGN KEY (JNO) REFERENCES Jobs (JNO)
)
Primary/Foreign Key Constraints
• Alter Table Statement
• Add Constraints
– General Format:
ALTER TABLE table_name
ADD CONSTRAINT constraint_definition
• Drop Constraints
– General Format:
ALTER TABLE table_name
DROP CONSTRAINT constraint_name
Creating indexes
• create index statement
create [clustered /nonclustered] index index_name
on table_name (column_name)
Questions ?