What Are Statistics?
Download
Report
Transcript What Are Statistics?
Gathering Statistics
Copyright © 2005, Oracle. All rights reserved.
Objectives
After completing this lesson, you should be able to do
the following:
• Use the DBMS_STATS package
•
•
•
8-2
Identify table, index, and column statistics
Identify predicate selectivity calculations
Create histograms for columns with skewed data
Copyright © 2005, Oracle. All rights reserved.
What Are Statistics?
Information used by cost-based optimization:
• To estimate the selectivity of predicates
• To estimate the cost of each execution plan
• To estimate the number of table rows
• To estimate CPU and I/O costs
8-3
Copyright © 2005, Oracle. All rights reserved.
Types of Statistics
•
Object statistics
– Table statistics
– Column statistics
– Index statistics
•
System statistics
– I/O performance and utilization
– CPU performance and utilization
8-4
Copyright © 2005, Oracle. All rights reserved.
Gathering Statistics
•
•
•
8-5
Estimation based on random data sampling
Exact computation
User-defined statistics collection methods
Copyright © 2005, Oracle. All rights reserved.
DBMS_STATS Package
You can use the DBMS_STATS package to:
•
•
•
•
Generate and manage statistics for cost-based
optimization
Gather, modify, view, export, import, and delete
statistics
Identify or name the statistics gathered
Gather statistics on:
– Indexes, tables, columns, and partitions
– All schema objects in a schema or database
•
•
8-6
Gather statistics either serially or in parallel
Gather statistics needed for cost-based
optimization
Copyright © 2005, Oracle. All rights reserved.
Methods for Gathering Statistics
Procedure
8-7
Description
GATHER_INDEX_STATS
Collects index statistics
GATHER_TABLE_STATS
Collects table, column, and
index statistics
GATHER_SCHEMA_STATS
Collects statistics for all objects
in a schema
GATHER_SYSTEM_STATS
Collects CPU and I/O statistics
for the system
GATHER_DATABASE_STATS
Collects statistics for all objects
in a database
Copyright © 2005, Oracle. All rights reserved.
Gathering Automated Statistics
•
•
You can automatically gather statistics or create
lists of tables that have stale or no statistics.
To automatically gather statistics, run either of the
following:
– DBMS_STATS.GATHER_SCHEMA_STATS
– DBMS_STATS.GATHER_DATABASE_STATS
8-8
Copyright © 2005, Oracle. All rights reserved.
Monitoring Automated Statistics Gathering
•
Bring tables of a specific schema or database into
monitoring mode:
– DBMS_STATS.ALTER_SCHEMA_TAB_MONITORING
– DBMS_STATS.ALTER_DATABASE_TAB_MONITORING
•
•
•
•
8-9
Use the MONITORING keyword in the CREATE and
ALTER TABLE statement syntax.
Monitoring tracks the approximate number of DML
since statistics were last gathered.
This data is used to identify stale statistics.
Automated statistics gathering is enabled using
DBMS_STATS.GATHER_TABLE_STATS with the
GATHER STALE option.
Copyright © 2005, Oracle. All rights reserved.
Generating Table Statistics
dbms_stats.gather_table_stats
('sh'
-- schema
,'customers'
-- table
, null
-- partition
, 20
-- sample size(%)
, false
-- block sample?
,'for all columns' -- column spec
, 4
-- degree of //
,'default'
-- granularity
, true
-- cascade to indexes
);
8-11
Copyright © 2005, Oracle. All rights reserved.
Gathering Statistics: Example
BEGIN
dbms_stats.create_stat_table
('SH', 'STATS');
dbms_stats.gather_table_stats
('SH', 'CUSTOMERS'
,stattab => 'STATS');
END;
BEGIN
dbms_stats.delete_table_stats
('SH', 'CUSTOMERS');
END;
8-13
Copyright © 2005, Oracle. All rights reserved.
Generating System Statistics
•
•
Applications can gather statistics during the day.
Statistic gathering ends after 720 minutes and is
stored in the MYSTATS table:
BEGIN
dbms_stats.gather_system_stats(
gathering_mode => 'interval',
interval => 720,
stattab => 'mystats',
statid => 'oltp');
END;
/
8-14
Copyright © 2005, Oracle. All rights reserved.
Index Statistics
Index statistics can be collected with the following
commands:
• DBMS_STATS
• CREATE INDEX
• ALTER INDEX…REBUILD
8-16
Copyright © 2005, Oracle. All rights reserved.
Verifying Table Statistics
SQL> SELECT table_name, num_rows, blocks,
2
avg_row_len,
3
to_char(last_analyzed, 'mm/dd/yyyy
4
hh24:mi:ss') analyzed
5 FROM user_tables
6 WHERE table_name ='EMPLOYEES';
8-17
Copyright © 2005, Oracle. All rights reserved.
Verifying Index Statistics
SQL>
2
3
4
5
6
7
8
8-18
SELECT index_name "name", num_rows,
distinct_keys "distinct",
leaf_blocks, clustering_factor
"cf", blevel "level",
avg_leaf_blocks_per_key "avg_leaf"
FROM user_indexes
WHERE table_owner = 'SH'
ORDER BY index_name;
Copyright © 2005, Oracle. All rights reserved.
Verifying Column Statistics
SQL>
2
3
4
5
8-20
SELECT column_name, num_distinct,
num_nulls, num_buckets, density
FROM user_tab_col_statistics
WHERE table_name ='SALES'
ORDER BY column_name;
Copyright © 2005, Oracle. All rights reserved.
Maintaining Statistics
DECLARE
num_rows number;
num_blocks number;
avg_row_len number;
BEGIN
dbms_stats.get_table_stats('oe','orders',null
,
'savestats','test1',
num_rows,num_blocks,avg_row_len);
-- print the values
dbms_output.put_line('num_rows='||num_rows||'
,num_blocks='||num_blocks||
',avg_row_len='||avg_row_len);
END;
/
8-22
Copyright © 2005, Oracle. All rights reserved.
Dynamic Sampling
Dynamic sampling should be used when:
• A better plan could be found
• The cost of collecting the statistics is minimal
compared to the execution time
• The query is executed many times
8-23
Copyright © 2005, Oracle. All rights reserved.
Enabling Dynamic Sampling
•
Dynamic sampling is not performed when
OPTIMIZER_DYNAMIC_SAMPLING = 0
•
Dynamic sampling is performed when:
– OPTIMIZER_DYNAMIC_SAMPLING = 1
– The query accesses more than one table.
– There are no table statistics and no index
– A full table scan is selected by optimizer
•
More aggressive dynamic sampling is performed
when:
– Value is set higher
– Upper limit is 10
– OPTIMIZER_DYNAMIC_SAMPLING > 1
8-24
Copyright © 2005, Oracle. All rights reserved.
Predicate Selectivity
•
Unique or primary key = constant (This is a
single-row predicate.)
•
Nonunique index = constant
selectivity = 1/distinct_keys
•
Bounded or unbounded range scan
selectivity = (high-low+1)/(max-min+1)
– high: upper bound (or max)
– low: lower bound (or min)
– max,min: column statistics
8-25
Copyright © 2005, Oracle. All rights reserved.
Bind Variables and Predicate Selectivity
•
•
•
Equality condition: No difference
Bounded or unbounded range scan: Assumes
built-in default selectivity
If high performance is an issue, consider using
dynamic SQL instead of bind variables:
– DBMS_SQL package
– EXECUTE IMMEDIATE
•
•
8-26
Dynamic SQL implies less shared SQL.
Optimizer performs bind peeking.
Copyright © 2005, Oracle. All rights reserved.
Histograms
A histogram does the following:
• Partitions the values in the column into bands
• Reorders column values in a band to fall within the
same range
• Provides improved selectivity estimates in the
presence of data skew
• Enables optimal execution plans with nonuniform
data distributions
8-27
Copyright © 2005, Oracle. All rights reserved.
Histograms and Selectivity
•
•
•
Popular and nonpopular values
Range scan selectivity is determined by the
number of buckets spanned.
Equality search selectivity is determined by:
– Number of buckets (popular values)
– DENSITY (nonpopular values)
8-29
Copyright © 2005, Oracle. All rights reserved.
Creating Histograms
•
•
Use DBMS_STATS to create histograms.
Generate statistics for the PRODUCTS table and the
PROD_LIST_PRICE column (maximum 50 buckets).
SQL> execute dbms_stats.gather_table_stats
('sh','products',
method_opt => 'for columns size 50
prod_list_price'
);
8-30
Copyright © 2005, Oracle. All rights reserved.
Histogram Tips
•
•
•
•
•
•
8-31
Do not use histograms unless they substantially
improve performance.
Histograms allocate additional storage.
Create histograms on columns that are used
frequently in WHERE clauses of queries.
Create histograms on columns that have a highly
skewed data distribution.
Histograms, like all other optimizer statistics, are
static.
Recompute the histogram when the data
distribution of a column changes frequently.
Copyright © 2005, Oracle. All rights reserved.
When to Use Histograms
•
•
8-32
Histograms can affect performance and should be
used with caution.
Use histograms for a column when the data is
highly skewed.
Copyright © 2005, Oracle. All rights reserved.
Choosing a Sample Size
•
•
•
8-33
If the data is uniformly distributed, 5% of the rows
suffice.
Choose a higher sample size if the number of
distinct values is more than 10% of the number of
rows.
Use AUTO_SAMPLE_SIZE.
Copyright © 2005, Oracle. All rights reserved.
Choosing the Number of Buckets
•
•
•
•
8-34
Start with the default size of 75 buckets.
Experiment with different sizes to obtain optimum
results.
Set the number of buckets to be greater than the
number of frequently occurring distinct values.
Set method_opt.
Copyright © 2005, Oracle. All rights reserved.
Viewing Histogram Statistics
•
Histogram information: USER/ALL_HISTOGRAMS
SQL>
2
3
4
•
8-35
SELECT endpoint_number, endpoint_value
FROM user_tab_histograms
WHERE table_name = 'PRODUCTS'
AND column_name = 'PROD_LIST_PRICE';
The number of buckets in each histogram:
USER/ALL_TAB_COL_STATISTICS
Copyright © 2005, Oracle. All rights reserved.
Summary
In this lesson, you should have learned how to:
• Use the DBMS_STATS package
•
•
•
Copy statistics between databases
Determine selectivity for predicates with and
without bind variables
Use histograms
– Popular values
– Nonpopular values
– Table, column, index, and cluster statistics
collected
8-37
Copyright © 2005, Oracle. All rights reserved.
Practice 8: Overview
This practice covers the following topics:
• Generating table statistics
• Creating a histogram
• Calculating selectivity
• Querying data dictionary information
• Deleting table statistics
8-38
Copyright © 2005, Oracle. All rights reserved.