Practical Space Management in Data Warehouse

Download Report

Transcript Practical Space Management in Data Warehouse

Practical Space Management in
Data Warehouse Environments
Hamid Minoui
Database Specialists, Inc.
www.dbspecialists.com
[email protected]
1
Objectives
 To point out data warehouse space
management issues
 Suggest resolutions
 Recommend space management
methodologies
 Provide proactive prevention strategies
 Cover both Oracle 9i and Oracle 10g
space management features
2
Characteristics of a
Data Warehouse
 The data:
–
–
–
–
–
–
–
Large amount of data loads and ETL operations
Very large size (Terabytes)
Change in structure of source data
Contains lots of historical data
Data massaging and aggregations
Multiple sources of data
Dynamic nature of data
3
Characteristics of a
Data Warehouse (continued)
 Maintenance activities:
–
–
–
–
–
–
Space management
Table re-organizations
Index rebuilds
Partition maintenance
Refresh maintenance on materialized views
Job and scheduling management
4
Characteristics of a
Data Warehouse (continued)
 Typical issues:
–
–
–
–
–
Data integrity issues
Data security issues
Space issues
Query performance issues
Duplicate rows
5
Characteristics of a
Data Warehouse (continued)
 Database features frequently used:
–
–
–
–
–
–
–
Materialized views (MV)
Bitmap indexes and bitmap-join indexes
Index organized tables (IOT)
Parallel execution
Table and index partitioning
Table and index compression
Load utilities and facilities
6
Other Characteristics








Star schemas, snow flakes or 3rd Normal Form
Have dimensions and hierarchy
Frequent need to collect statistics
Use of bulk and parallel loads
Variety in the generated queries
Dynamic nature of queries
Divided into areas (staging, ODS, and target area)
Often associated with smaller data marts
7
Performance Tuning
and Resolutions





Frequent query tuning
Star transformation
De-normalization
Pre-aggregations via materialized views
B*Tree, IOT, function based, bitmap, bitmapjoin indexes
 Use of database resource management
8
Why is Space a Coveted
Resource in a Data Warehouse?








Lots of disk space is consumed
Stores all enterprise data
Segments are mostly large
Many indexes
Years of historical data kept online
Many versions of the same data
Duplicated and de-normalized data
Various levels and dimensions of data
(monthly, weekly, daily)
9
Why is Space a Coveted
Resource in a Data Warehouse?
 Enough reserve space needed:
–
–
–
–
–
–
–
For daily/weekly/monthly growth
Recall offline old data when needed
Data correction
Materialized views and their growth
Emergency needs
Data files and tablespace growth
Temporary tablespaces
10
Reacting to Space Issues
 Down sides:
– Often, not enough time to react
– Delay in the load
– Wasted resources to reload
 Up sides:
– Loads are usually scheduled
– Once data is loaded, most of it won’t change
11
Issues with Database Backups
in a Data Warehouse




Too many files to backup every night
Backup takes a long time to complete
System resources busy during backup
Possible licensing issues with third-party
backup software
 Restoring and recovery after a failure can take
a long time
12
A Typical Backup Strategy
 Make non-current table spaces READONLY
every month
 Perform a special backup of READONLY
tablespaces
 Exclude the READONLY table spaces from
regular hot backups
 Never backup temporary tablespaces
Caveat: You must wait until all transactions
are committed
13
Avoiding Unnecessary
Redo Log Generation
 Create some tables and all indexes with
NOLOGGING for any segment that can be regenerated without doing a database recovery:
• SQL*Loader with direct path load
• CREATE TABLE AS SELECT from external or transient
tables
• INSERT using +append hint
• Use global temporary tables
insert /* +append */ into transiant_table
select * from source_table ;
create table transient_table
as select * from source_table ;
14
Speeding Up Bulk Load
Operations
 Before the load:
– Make all non-unique indexes unusable
– Disable the primary and unique constraints if the
source data is trusted
– Disable all triggers on the table
– Set the session to skip unusable indexes
15
Speeding Up Bulk Load
Operations
 Implement the load:
– Use append and parallel hints with insert
– Commit the transaction
 After the load:
– Rebuild indexes
– Enable triggers and constraints
16
Space Issues in
Data Warehouses
 Permanent tablespaces (data, indexes)
 Temporary tablespaces (temp segments)
 UNDO segments and tablespace
17
Space Issues with
Permanent Tablespaces
 Caused by:
–
–
–
–
–
Poor extent sizing
Setting maxextents
PCT_INCREASE > 0
Small data files (tablespaces)
User quota on tablespace
18
Space Issues with
Temporary Tablespace
 Caused by:
– Not enough space for the sort segments
– Other temp segments such as global temporary
tables
– Multiple users sharing the same temporary space
– Multiple queries with sort requirements running at
any time
19
Space Issues with
Temporary Tablespace
 Partially resolved by:
– Oracle 9i - Dynamic PGA memory allocation
• PGA_AGGREGATE_TARGET=<integer value>
• WORKAREA_SIZE_POLICY=AUTO
– Oracle 10g - Tablespace Group assignment
20
Space Issues Associated with
Undo Segments
 Long running queries causing ORA-1555
(snapshot too old)
 Small UNDO tablespace
 Small rollback segments
21
Database Block Size
(DB_BLOCK_SIZE)
 Should seriously be considered
 An important decision with new data
warehouse projects
 Inappropriate value can be disastrous and
detrimental
 Small value can:
– Impact I/O efficiency for majority of queries
– Negatively influence overall database performance
22
Appropriate DB_BLOCK_SIZE
Value
 Multiple of the OS block size
 As large as your I/O subsystem can handle in
a single read
 As large as supported by Oracle
 Best benefit from larger block size if:
– Database is configured on raw devices, or
– Direct I/O is available to you.
23
Benefits of Larger
DB_BLOCK_SIZE Value
 Efficiency with index scan
– A larger block size reduces the number of reads
required to probe an index and scan a range of
values from its leaf blocks
 Less memory requirement for buffer cache
– Fewer buffers needed for index branch blocks
 Better compression ratio for tables, indexes
 Improvement in block density
– Amount of space used by fixed portion
of bock header is reduced
24
Benefits of Larger
DB_BLOCK_SIZE Value
 Blocks can accommodate longer rows; less
chance for row chaining
 Less occurrence of ORA-1555
– Increase in size of the transaction table in undo
segments header blocks
 Fewer writes required for data loads
– Because of the reduced block level overhead, less
redo logs are generated when blocks are modified
sequentially
25
Disks, I/O and Database Files
Configuration
 A poorly configured I/O subsystem can badly
impact I/O performance
 Poor I/O performance can impair a data
warehouse
 Configure disk and distribute data for read and
write efficiency
 Use raw I/O if possible, otherwise use direct I/O
 Make use of asynchronous I/O, parallel read and
parallel writes
26
Disks, I/O and Database Files
Configuration
 Stripe and Mirror or Mirror and Stripe the disks
– RAID-1+0 or RAID-0+1
 Evenly spread your data and Stripe And Mirror
Everything (SAME) on many disks
 Reserve room on file systems for auto
extendable files
27
Managing the UNDO Segments
 Manual undo (rollback segments)
management
– Pre Oracle 9i practices
– Too many manual interventions by DBA
28
Managing UNDO (continued)
 Automatic Undo Management (AUM)
– Much better – Highly recommended
– Allows controlling retention of committed
transactions undo information
(UNDO_RETENTION)
– Better monitoring statistics
– Infrequent occurrence of ORA-1555
– SMON periodically manage space and shrinks
undo segments
29
UNDO_RETENTION
Parameter Setting
 Set to a value equal to the time used by the
longest running query
 Undo is ‘expired’ when retention time is
reached
 Expired undo will be de-allocated if needed by
new transactions
 Unexpired undo are re-used if space is
needed (undo reuse)
 Default value is 300 seconds
30
Undo Reuse and Undo Stealing
 Undo Reuse:
Unexpired undo of the same segment will be
reused
 Undo Stealing:
Unexpired undo of another segment is used
 Undo reuse is more common. Occurs when
– UNDO tablespace is too small, or
– UNDO_RETENTION value is too large
31
Monitoring the UNDO
Segments Statistics
 Statistics are gathered in V$UNDOSTAT every
10 minutes
 Helps sizing UNDO tablespaces and tune
UNDO_RETENTION
 Statistics are retained for 7 days
32
V$UNDOSTAT
BEGIN_TIME
Beginning time for this interval
END_TIME
Ending time for this interval
UNDOTSN
Tablespace ID of the last active undo within
the interval
UNDOBLKS
Number of consumed undo blocks within the
period
MAXQUERYLEN
The longest length of time (in seconds) a
query took to complete within this period
TXNCOUNT
Total number of transactions executed with the
period
33
V$UNDOSTAT (continued)
UNXPSTEALCNT
Number of attempts to obtain undo space by
stealing unexpired extents from other undo
segments
UNXPBLKRELCNT
Number of unexpired blocks released from undo
segments to be used by other transactions
SSOLDERRNT
Number of times ORA-1555 occurred with the
period
NOSPACERRCNT
Number of times space was unavailable in the
undo tablespace when requested and failed
34
Tuning UNDO_RETENTION
 Oracle 9i:
– Manually adjust to the time taken by the longest
query
SELECT MAX (MAXQUERYLEN) FROM V$UNDOSTAT;
 Oracle 10g:
– Automatically tracked and tuned by RDBMS
35
The UNDO Tablespace
 Created at DB creation or with CREATE
UNDO TABLESPACE
 Use V$UNDOSTAT for sizing and monitoring
 Space issues if UNDO_RETENTION is too
large
 Use AUTOEXTEND
 RETENTION_GUARANTEE clause
 Sizing formula:
Undo Segment Space Required (MB) =
(undo_retention * undo_blcks/secs * DB_BLOCK_Size)/1024
36
Database Fragmentation Issues
 Best to reduce or eliminate fragmentation to
avoid wastage and improve performance
– Tablespace level (or file level) fragmentation
– Segment level fragmentation
– Block level fragmentation
37
Tablespace Level Fragmentation
 Bubble Fragmentation
– Free block of space not large enough for another
extent
 Honeycomb Fragmentation
– Free un-coalesced space next to each other but
considered separate
38
Segment Level Fragmentation
 Space allocated to segment is not fully utilized
(wasted)
– Space above the high water mark (unused blocks)
– Free segment blocks below the high water mark
39
Block Level Fragmentation
 Blocks are not empty but there is space within
a block that is not used
 Caused by:
– Setting of PCTFREE and PCTUSED
– Deletions
– Row migrations
40
Tablespace Planning
 Use locally managed tablespaces (LMTs) with
UNIFORM size extents
– 64K bitmaps on file header are used to manage
extents
– Improves performance and significantly reduces
overhead associated with updating dictionary
tables (recursive SQL)
– No need to use ST enqueue
– No more tablespace fragmentation
41
Tablespace Planning
 Use Automatic Segment Space Management
(ASSM)
– Set at the tablespace level
– Tablespace must be locally managed
– Uses bitmap instead of freelist to manage space
within segments
42
Benefits of ASSM
 No more need for FREELISTS, FREELIST
GROUPS and PCTUSED
 Reduces segment level and block level
fragmentations
 Reduces the number of buffer free waits
 Adds efficiency to space usage
 Provides better use of space within the blocks
43
LMT Considerations
 The bitmap is 64K
– Make the size of each file a multiple of UNIFORM
extent+64K
 Storage parameters
– Avoid setting them
– If already defined on segments reorganize, or
rebuild with storage parameters matching
tablespace
44
Multiple Tablespace Size Models
 SAFE (methodology)
 Group segments according to size (3 groups)
 Use 3 tablespace model having different
UNIFORM extents
 Assign each group to one of the size model
 Develop a naming convention
Segment Size
Extent Size
Size Model
< 128 M
128 KB
Small
>= 128 M & < 4 GB
4 MB
Medium
>= 4 GB
128 MB
Large
45
Tablespaces for Different
Types of Segments
 Separate indexes and tables
– Better manageability
– Different type of usage
– Reduces wastage (indexes are rebuilt often in data
warehouses)
46
Adjust Settings of PCTFREE and
PCTUSED Parameters
 Avoid using default values
 Set according to usage
 Most of the times PCTFREE=0 and
PCTFREE=99 should be enough
 If ASSM, no need for PCTUSED
 More compact data in blocks reduces waste
and improves I/O
47
Use Index Organized Tables
(IOTs)




When most of the columns are indexed
When associated tables are used
Columns are pre-sorted
Makes better use of space and improve
performance
 Good for certain data warehouse tables
48
Table Compression
– Introduced in Oracle 9i R2
– Improves read only operations and factors out
repetitive values within a block
– Replaces duplicate values in a block with a
reference to a symbol table in the block
– Very low CPU overhead to reconstruct the block
– Significantly fewer blocks, leading to better I/O
– Very flexible (not all blocks are compressed)
– Associated with bulk load operations
49
Table Compression
 To compress a table use:
ALTER TABLE t1 MOVE compress;
 To compress a table partition use:
ALTER TABLE T1 MOVE PARTTION P1 compress;
 Alternative way CTAS compress:
CREATE TABLE T1 compress
AS SELECT * FROM T1_UNCOMPRESSED;
 Table or partition not available (locked) during
move
 Use DBMS_REDEFINITION for online move
50
To Get the Best Results

To achieve the best compression ratio:
1. Analyze the table to get column statistics
SELECT COLUMN_NAME, NUM_DISTINCT, NUM_NULLS, AVG_COL_LEN
FROM DBA_TAB_COLUMNS ;
2. Identify best candidate columns for sorting as
columns with
•
•
•
Lowest number of distinct values (low NUM_DISTINCT)
Least amount of null values (low NUM_NULLS)
Longest average length (high AVG_COL_LEN)
3. CTAS compress and use order by
candidate_column
51
Table Compression Limitations




Can not be used on LOB field
Can not be used for IOTs
Can not compress tables with bitmap indexes
With Oracle 9i, cannot drop or add columns to
compressed tables
52
Index Key Compression
 Introduced in Oracle 8i
 Compression of leading index columns
 Indexes are grouped into a suffix and prefix
entry
– Suffix entry made out of unique pieces
– Prefix entry consist of the grouping piece
 Can offer significant space savings and better
I/O performance
53
Index Key Compression Example
 Current year’s Car Inventory table, index
CAR_IND
indexes columns are: Type, Color, Model
 Before compression:
<SUV><Black><Rock Climber>
<Sedan><Blue><Charisma>
<SUV><Black><Jungle Cruiser>
<Sedan><Blue><Fantasy>
<SUV><Black><Mountaineer>
<Sedan><Blue><Starlet>
….
…
54
Index Key Compression Example
(continued)
 ALTER INDEX CAR_IND compress 3;
 After compression:
<SUV><Black>
<Rock Climber> <Jungle Cruiser> <Mountaineer>
<Sedan><Blue> <Charisma><Fantasy><Starlet>
….
55
Index Key Compression






Partitioned indexes cannot be compressed
Bitmap indexes cannot be compressed
Can be defined on IOT
Slight CPU overhead during index scan
Consumes much less space
Increases I/O throughput and buffer cache
efficiency
 Ideal for data warehouses
56
Identifying Keys to Compress
1. Validate or analyze the index
VALIDATE INDEX INDX1;
2. Query the index_stats view
SELECT NAME, OPT_CMPR_COUNT, OPT_CMPR_PCTSAVE
FROM index_stats;
3. Examine output
NAME
------INDX1
OPT_CMPR_COUNT
-------------2
OPT_CMPR_PCTSAVE
---------------------57
57
De-Allocating Unused Space
 Segment Level:
– Blocks above the segment high water mark
(unused blocks)
– Space below the segment high water mark (free
blocks)
 Tablespace Level
– Free space within tablespace
– Data file level
– Unallocated space above the highest allocated
extent (file high water mark)
58
Identify Segment Space Usage
 DBMS_SPACE.UNUSED_SPACE
– Information about amount of unused space in
segment and position of high water mark
 DBMS_SPACE.FREE_BLOCKS
– Information about the number of blocks on the
freelist groups
 DBMS_SPACE.SPACE_USAGE
– Information about the space usage of blocks under
the high water mark
59
De-Allocate Segment Free Space
 Unused blocksALTER [TABLE | INDEX | CLUSTER] segment_name
DEALLOCATE UNUSED [KEEP nK ]
– De-allocates only space above segment high water
mark, retaining space specified by KEEP
 Other Unused space–Pre Oracle 10g – Reorganize table, rebuild index
•
Table move, export/import, DBMS_REDEFINITION
interface)
–Oracle 10g – Online segment shrink
60
Two-Phase Online Segment
Shrink
 ALTER TABLE table SHRINK SPACE;
– Phase 1:
• A series of DELETE and INSERT statements applied to
move data to the beginning of the segment
• DML-compatible changes are held on rows and blocks
– Phase 2:
• High water mark adjusted to the appropriate location.
• Exclusive lock is held
• Unused blocks (above high water mark) are de-allocated
61
One-Phase Online Segment
Shrink
 ALTER TABLE table SHRINK SPACE
COMPACT;
 With COMPACT keyword only the first phase
is executed.
 To implement phase 2, issue it without
COMPACT keyword at a later time
62
One-Phase Online Segment
Shrink (continued)
 Restrictions
– Row movement must be enabled
– Triggers based on ROWID of table must be
disabled
 In data warehouses, locking might not be a
problem on some tables
63
De-allocating Space at the
Tablespace Level
 Caused by tablespace fragmentation
– Index rebuilds, table moves, partition move, etc.
– Not having UNIFORM size extents
64
De-allocating Space at the
Data File Level
 File size larger than the last block used in the
file
 Size over-estimated
 Auto extended
65
Shrinking Data Files
The statement:
ALTER DATABASE DATAFILE ‘file_name’ resize n (K | M);
– Attempts to size the data file to exactly n K (or M)
– It is safe. It will fail with ORA-03297, if there are
blocks of data beyond the requested resize value
ORA-03297: File contains nnn blocks of data beyond
requested resize value.
66
Steps to Shrink Data Files to High
Water Mark Position
1) Create a temporary table preferably a GTT
CREATE global temporary table SPACE_ADMIN_GTT
ON COMMIT PRESERVE ROWS
AS
SELECT FILE_NAME, TABLESPACE_NAME, BYTES, BYTES, BYTES
FROM DBA_DATAFILES WHERE 1=0;
2) Create another table with name of
tablespace to shrink
CREATE GLOBAL TEMPORAY TABLE SHRINKING_TBS_GTT
ON COMMIT PRESERVE ROWS
AS
SELECT TABLESPACE_NAME FROM DBA_TABLESPACES
WHERE TABLESPACE_NAME in (‘TBS1’,’TBS2’,’TBS3’);
COMMIT;
67
Steps to Shrink Data Files to High
Water Mark Position (continued)
3) Get DB_BLOCK_SIZE
column value new_val blksize
select value from v$parameter
where name = 'db_block_size'
;
68
Steps to Shrink Data Files to High
Water Mark Position (continued)
4) Calculate the file’s high water mark and save
INSERT INTO SPACE_ADMIN_GTT
SELECT file_name, tablespace_name,
ceil( (nvl(hwm,1)*&&blksize)/1024/1024 ) smallest,
ceil( blocks*&&blksize/1024/1024) currsize,
ceil( blocks*&&blksize/1024/1024) ceil( (nvl(hwm,1)*&&blksize)/1024/1024 ) savings
FROM DBA_DATA_FILES a,
( SELECT file_id, max(block_id+blocks-1) hwm
FROM DBA_EXTENTS
GROUP BY file_id ) b
WHERE a.file_id = b.file_id(+)
AND a.tablespace_name IN (SELECT tablespace_name FROM
SHRINKING_TBS_GTT)
;
COMMIT;
69
Steps to Shrink Data Files to High
Water Mark Position (continued)
5) Generate ALTER DATABASE commands
column cmd format a95 word_wrapped
set trimspool on
SPOOL c:\TMP\dbf_resize.sql
SELECT 'alter database datafile '''||file_name||''' resize ' ||
smallest || 'm;' cmd
FROM SPACE_ADMIN_GTT
WHERE savings >= 5
;
SPOOL OFF
70
Automatically Resolving Space
Issues
 Oracle 9i Feature called RESUMABLE
SPACE ALLOCATION
 Allows an active session to be suspended if a
space issue is encountered
 The session resumes automatically when
– Space issue is fixed
– A timeout period (default: 2 hours) is reached
 Beneficial for data warehouse environments
71
Steps for Resumable Space
Allocation
1. DBA grants RESUMABLE privilege to user
2. User makes session resumable with
ALTER SESSION ENABLE RESUMABLE ;
3. If session encounters space problem, it is
suspended
72
Steps for Resumable Space
Allocation
4. If AFTER SUPSPEND TRIGGER exists, it
gets executed
5. If trigger does not exit (or disabled) or if the
trigger does not fix the space problem,
session remains suspended
6. Session resumes when space problem is
fixed or timeout value is reached
73
Other Helpful Space-Related
Features
 Oracle-Managed Datafiles (OMF)
 DBA_ADVISOR family of views
 Oracle10g Workload Repository (AWR) and
segment advisor
 Oracle 10g Grid Control for monitoring
74
Conclusion
 Oracle is consistent in offering new space
management related features in every release
 Should be used by DBAs for best practices
 They enhance performance, reduce waste,
improve availability, reduce frequency of
failures, and provide better monitoring
 Data warehouse operations that rely heavily
on space and I/O performance benefit the
most from these features
75
Contact Information
Hamid Minoui
Database Specialists, Inc.
388 Market Street, Suite 400
San Francisco, CA 94111
Tel: 415/344-0500
Email: [email protected]
Web: www.dbspecialists.com
76