Transcript LES06

Storage Considerations for the
Physical Model
Database Sizing
• Sizing influences capacity planning
and systems environment management.
• Sizing is required for:
– The database
– Other storage areas
– Indexes
• Sizing is not a science.
• Techniques vary.
• DWM suggests when to perform
sizing.
Estimating the Database Size
1. Estimate the size of each row in the fact table.
2. Determine the grain of each dimension and
estimate the number of entries in the finest level.
3. Multiply the number of rows of all dimensions
and multiply the result by the fact table row size.
4. Determine whether the fact table is sparse or
dense and estimate the reduction or increase in
size.
Validating Database Size Assumptions
After you estimate the size of the database, you can
validate your assumptions by doing the following:
• Extract sample files
• Load data into the database
• Compute exact expected row lengths
• Add overhead for indexing, rollback and temporary
tablespaces, aggregates, views, and a file system
staging area for flat files
Example: Estimating the Database Size
Description
Estimation
Estimate the size 52 bytes (assumed for this example)
of one row of the
fact table
Estimate the
Channel
3 channels
entries in the
Customer
63 ship_to_locations
lowest level
Product
36 items
within each
History
48 months
dimension
Multiply the # of
entries for each (3 x 63 x 36 x 48) x 52 = 16,982,784 bytes
dimension and
multiply the
result by the fact
table row size
Sparsity is low,
16,982,784 * .10 = 1,698,278
adjust by 10%
16,982,784 - 1,698,278 =
15,284,506 bytes
Estimated
15.3 MB
database size
Applying the Test Load Sampling
• Analyze statistically significant data samples
• Use test loads for different periods
• Reflect day-to-day operations
• Include seasonal data and worst-case scenarios:
– Calculate the number of transactions
– Use the average sales price approach
• Consider indexes and summaries
Test Load Sampling
Using the Server
• Load a sample of data.
• Query to determine the number of rows per block.
• Estimate based on the number of rows.
Query
3000+ rows
Indexes
ORDER_LINE
Unique index
Primary key
Nonunique index
Foreign key
Order_PK
Order_Line_Num
Item_FK
Units
PRODUCT
Unique index
Primary key
Database
Index
data blocks
Table
data blocks
Item_PK
Package
Item_Desc
Item_Source
Family_PK
Class_PK
Indexing Types
• B*tree index
• Bitmap index
Indexing Types
• B*tree index: small number of distinct values in a
particular column
• Bitmap index: used for star query transformations
Indexing Strategies
?
?
?
Columns, tables, and index types
size
size
size
=
=
=
‘SMALL’0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 1
‘MED’ 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0
‘LARGE’0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0
color =
color =
color =
‘BLUE’ 0 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0
‘RED’ 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1
‘GREEN’1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0
Result: color = blue and size = medium or large
B*tree Index
KING
KING
KING
MILLER
TURNER
TURNER
WARD
MILLER
SCOTT
SMITH
BLAKE
BLAKE
JAMES
KING
MARGIN
JAMES
JONES
BLAKE
CLARK
FORD
BLAKE-ROWID
CLARK-ROWID
FORD-ROWID
ADAMS
ALLEN
Bitmap Indexes
• Store values as 1s and 0s
• Are used instead of B*tree indexes when:
– Tables are large
– Columns have low cardinality
– Multiple columns are constrained in the same query
Bitmap Index Example
CUSTOMER table
CUSTOMER_NBR MARITAL_STATUS
101
102
103
104
105
106
single
married
married
divorced
single
married
REGION
GENDER
INCOME_LEVEL
east
central
west
west
central
central
male
female
female
male
female
female
bracket_1
bracket_4
bracket_2
bracket_4
bracket_2
bracket_3
CREATE BITMAP INDEX REGION_IDX ON CUSTOMER(REGION);
Sample bitmap index on the REGION column
REGION='east'
1
0
0
0
0
0
REGION = 'central' REGION = 'west'
0
0
1
0
0
1
0
1
1
0
1
0
Bitmap Index Example
SELECT COUNT(*) FROM CUSTOMER
WHERE MARITAL_STATUS = ‘married’ AND
REGION IN (‘central’, ‘west’);
MARITAL_STATUS
= 'married'
0
1
1
0
0
1
AND
REGION =
'central'
0
1
0
0
1
1
OR
0
0
1
1
0
0
Query
Result
REGION =
'west'
=
0
1
1 AND
0
0
1
0
1
1
1
1
1
0
1
= 1
0
0
1
col 2
col 3
col 5
Partitioning Tables and Indexes
Large tables and indexes can be partitioned into
smaller, more manageable pieces.
Tablespace
Segment
Extent
Blocks
Structure
Advantages of Partitioning
Consider using partitioning for:
• Very large databases (VLDBs)
• Reduction of down time for scheduled
maintenance and reloading
• Reduction of down time for data failure
• Decision support systems (DSS) performance
• I/O performance
• Disk striping
• Partition transparency
Partitioning
• Partition table and index data by:
– Time
– Sales
– Geography
– Organization
– Line of business
• Partition by time