Transcript 第11 章
Database Systems: Design,
Implementation, and
Management
Eighth Edition
Chapter 11
Database Performance Tuning and
Query Optimization
Objectives
• In this chapter, you will learn:
– Basic database performance-tuning concepts
– How a DBMS processes SQL queries
– About the importance of indexes in query
processing
– About the types of decisions the query optimizer
has to make
– Some common practices used to write efficient
SQL code
– How to formulate queries and tune the DBMS for
optimal performance
– Performance tuning in SQL Server 2005
Database Systems, 8th Edition
2
11.1 Database Performance-Tuning Concepts
• Goal of database performance is to execute
queries as fast as possible
• Database performance tuning
– Set of activities and procedures designed to
reduce response time of database system
• All factors must operate at optimum level with
minimal bottlenecks
• Good database performance starts with
good database design
Database Systems, 8th Edition
3
Database Systems, 8th Edition
4
Performance Tuning: Client and Server
• Client side
– Generate SQL query that returns correct answer
in least amount of time
• Using minimum amount of resources at server
– SQL performance tuning
• Server side
– DBMS environment configured to respond to
clients’ requests as fast as possible
• Optimum use of existing resources
– DBMS performance tuning
Database Systems, 8th Edition
5
DBMS Architecture
• All data in database are stored in data files
• Data files
– Automatically expand in predefined increments
known as extends
– Grouped in file groups or table spaces
• Table space or file group:
– Logical grouping of several data files that store
data with similar characteristics
Database Systems, 8th Edition
6
Basic DBMS architecture
Database Systems, 8th Edition
7
DBMS Architecture (continued)
• Data cache or buffer cache: shared, reserved
memory area
– Stores most recently accessed data blocks in RAM
• SQL cache or procedure cache: stores most
recently executed SQL statements
– Also PL/SQL procedures, including triggers and
functions
• DBMS retrieves data from permanent storage and
places it in RAM
Database Systems, 8th Edition
8
DBMS Architecture (continued)
• Input/output request: low-level data access
operation to/from computer devices, such as
memory, hard disks, videos, and printers
• Data cache is faster than data in data files
– DBMS does not wait for hard disk to retrieve data
• Majority of performance-tuning activities focus on
minimizing I/O operations
• Typical DBMS processes:
– Listener, User, Scheduler, Lock manager, Optimizer
Database Systems, 8th Edition
9
Database Statistics
• Measurements about database objects and
available resources
– Tables, Indexes, Number of processors used,
Processor speed, Temporary space available
• Make critical decisions about improving query
processing efficiency
• Can be gathered manually by DBA or automatically
by DBMS
– UPDATE STATISTICS table_name [index_name]
– Auto-Update and Auto-Create Statistics option
• 資料庫屬性 -> 自動更新統計資料
• 資料庫屬性 -> 自動建立統計資料
Database Systems, 8th Edition
10
Database Systems, 8th Edition
11
Ch08: dbcc show_statistics (customer, PK__CUSTOMER__24927208 )
Ch08: dbcc show_statistics (customer, CUS_UI1)
補充 SQL Server 2005
Database Systems, 8th Edition
12
11.2 Query Processing
• DBMS processes queries in three phases
– Parsing
• DBMS parses the query and chooses the most
efficient access/execution plan
– Execution
• DBMS executes the query using chosen
execution plan
– Fetching
• DBMS fetches the data and sends the result back
to the client
Database Systems, 8th Edition
13
Database Systems, 8th Edition
Query Processing
14
SQL Parsing Phase
• Break down query into smaller units
• Transform original SQL query into slightly
different version of original SQL code
– Fully equivalent
• Optimized query results are always the same as
original query
– More efficient
• Optimized query will almost always execute faster
than original query
Database Systems, 8th Edition
15
SQL Parsing Phase (continued)
• Query optimizer analyzes SQL query and finds
most efficient way to access data
– Validated for syntax compliance
– Validated against data dictionary
• Tables, column names are correct
• User has proper access rights
– Analyzed and decomposed into more atomic
components
– Optimized through transforming into a fully equivalent
but more efficient SQL query
– Prepared for execution by determining the execution or
access plan
Database Systems, 8th Edition
16
SQL Parsing Phase (continued)
• Access plans are DBMS-specific
– Translate client’s SQL query into series of
complex I/O operations
– Required to read the data from the physical data
files and generate result set
• DBMS checks if access plan already exists for
query in SQL cache
• DBMS reuses the access plan to save time
• If not, optimizer evaluates various plans
– Chosen plan placed in SQL cache
Database Systems, 8th Edition
17
Database Systems, 8th Edition
18
SQL Execution and Fetching Phase
• All I/O operations indicated in access plan are
executed
– Locks acquired
– Data retrieved and placed in data cache
– Transaction management commands processed
• Rows of resulting query result set are returned
to client
• DBMS may use temporary table space to store
temporary data
– The server may send only the first 100 rows of 9000
rows
Database Systems, 8th Edition
19
Query Processing Bottlenecks
• Delay introduced in the processing of an I/O
operation that slows the system
–
–
–
–
–
CPU
RAM
Hard disk
Network
Application code
Database Systems, 8th Edition
20
SQL 敘述輸
入完成後
先不要執
行查詢,
請按下工
具列的顯
示估計執
行計劃
鈕
:
Database Systems, 8th Edition
21
11.3 Indexes and Query Optimization
• Indexes
– Crucial in speeding up data access
– Facilitate searching, sorting, and using
aggregate functions as well as join operations
– Ordered set of values that contains index key
and pointers
• More efficient to use index to access table than
to scan all rows in table sequentially
Database Systems, 8th Edition
22
Indexes and Query Optimization
• Data sparsity: number of different values a
column could possibly have
• Indexes implemented using: (課本 p. 453)
– Hash indexes
– B-tree indexes: most common index type. Used in
tables in which column values repeat a small number
of times. The leaves contain pointers to records. It is
self-balanced.
– Bitmap indexes: 0/1
• DBMSs determine best type of index to use
– Ex: CUST_LNAME with B-tree and REGION_CODE
with Bitmap indexes
Database Systems, 8th Edition
23
Database Systems, 8th Edition
B-tree and bitmap index representation 24
SELECT CUS_NAME
FROM CUSTOMER
WHERE CUS_STATE=‘FL’
Requires only 5 accesses to STATE_INDEX,
5 accesses to CUSTOMER
Index Representation for the
CUSTOMER table
25
11.4 Optimizer Choices
• Rule-based optimizer
– Preset rules and points
– Rules assign a fixed cost to each operation
• Cost-based optimizer
– Algorithms based on statistics about objects
being accessed
– Adds up processing cost, I/O costs, resource
costs to derive total cost
Database Systems, 8th Edition
26
Example
SELECT P_CODE, P_DESCRIPT, P_PRICE, V_NAME, V_STATE
FROM PRODUCT P, VENDOR V
WHERE P.V_CODE=V.V_CODE
AND V.V_STATE=‘FL’;
• With the following database statistics:
–
–
–
–
The PRODUCT table has 7000 rows
The VENDOR table has 300 rows
10 vendors come from Florida
1000 products come from vendors in Florida
Database Systems, 8th Edition
27
Database Systems, 8th Edition
28
Example
• Assume the PRODUCT table has the index
PQOH_NDX in the P_QOH attribute
SELECT MIN(P_QOH) FROM PRODUCT
could be resolved by reading only the first entry in
the PQOH_NDX index
Database Systems, 8th Edition
29
Using Hints to Affect Optimizer Choices
• Optimizer might not choose best plan
• Makes decisions based on existing statistics
– Statistics may be old
– Might choose less efficient decisions
• Optimizer hints: special instructions for the
optimizer embedded in the SQL command text
Database Systems, 8th Edition
30
Oracle 版本
Database Systems, 8th Edition
31
SQL Server Query Hints Example
select o.customerid,companyname
from orders as o inner MERGE join customers as c
on o.customerid = c.customerid
select o.customerid,companyname
from orders as o inner HASH join customers as c
on o.customerid = c.customerid
select o.customerid,companyname
from orders as o inner LOOP join customers as c
on o.customerid = c.customerid
select city, count(*)
from customers
group by city
OPTION (HASH GROUP)
MS SQL Server 的語法請參考:
http://msdn.microsoft.com/en-us/library/ms187713.aspx
Database Systems, 8th Edition
32
11.5 SQL Performance Tuning
• Evaluated from client perspective
– Most current relational DBMSs perform
automatic query optimization at the server end
– Most SQL performance optimization techniques
are DBMS-specific
• Rarely portable
• Majority of performance problems related to
poorly written SQL code
• Carefully written query usually outperforms a
poorly written query
Database Systems, 8th Edition
33
Index Selectivity
• Indexes are used when:
– Indexed column appears by itself in search
criteria of WHERE or HAVING clause
– Indexed column appears by itself in GROUP BY
or ORDER BY clause
– MAX or MIN function is applied to indexed
column
– Data sparsity on indexed column is high
• Index selectivity is a measure of how likely an
index will be used in query processing
Database Systems, 8th Edition
34
Index Selectivity (continued)
• General guidelines for indexes:
– Create indexes for each attribute in WHERE,
HAVING, ORDER BY, or GROUP BY clause
– Do not use in small tables or tables with low sparsity
– Declare primary and foreign keys so optimizer can
use indexes in join operations
– Declare indexes in join columns other than PK/FK
Database Systems, 8th Edition
35
Conditional Expressions
• Normally expressed within WHERE or HAVING
clauses of SQL statement
• Restricts output of query to only rows matching
conditional criteria
Database Systems, 8th Edition
36
• Common practices for efficient SQL:
– Use simple columns or literals in conditionals
• Avoid functions
– Numeric field comparisons are faster
• than character, date, and NULL comparisons
– Equality comparisons faster than inequality
• —the slowest is “LIKE” comparison
–
–
–
–
–
Transform conditional expressions to use literals
Write equality conditions first
AND: Use condition most likely to be false first
OR: Use condition most likely to be true first
Avoid NOT
Database Systems, 8th Edition
37
11.6 Query Formulation
• Identify what columns and computations are
required (p.459)
– Expressions
– Aggregate functions
– Granularity of raw data required
• Identify source tables
• Determine how to join tables
• Determine what selection criteria is needed
– Simple comparison? IN? Nested Comparison?
HAVING
• Determine in what order to display output
Database Systems, 8th Edition
38
11.7 DBMS Performance Tuning
• Includes managing DBMS processes in primary
memory and structures in physical storage
• DBMS performance tuning at server end
focuses on setting parameters used for:
– Data cache: large enough
– SQL cache: same query may be submitted by many
users
– Sort cache
– Optimizer mode
• Cost-based or Rule-based
Database Systems, 8th Edition
39
DBMS Performance Tuning
• Some general recommendations for creation of
databases:
– Use RAID (Redundant Array of Independent Disks) to
provide balance between performance and fault
tolerance
– Minimize disk contention
• At least with the following table spaces: system table,
user table, index table, temporary table, rollback segment
table
– Put high-usage tables in their own table spaces
– Assign separate data files in separate storage volumes
for indexes, system, high-usage tables
• Index operations will not conflict with data and data
dictionary, can use different disk block size
Database Systems, 8th Edition
40
DBMS Performance Tuning
• Some general recommendations for creation of
databases: (continued)
– Take advantage of table storage organizations in
database
• An indexed organized table stores the end user
table and the index table in consecutive locations
on permanent storage
– Partition tables based on usage
– Use denormalized tables where appropriate
– Store computed and aggregate attributes in
tables
Database Systems, 8th Edition
41
Common RAID Configurations
Database Systems, 8th Edition
42
11.8 Query Optimization Example
• Example illustrates how query optimizer works
• Based on QOVENDOR and QOPRODUCT
tables
• Uses Oracle SQL*Plus (Skip)
Database Systems, 8th Edition
43
Database Systems, 8th Edition
請參考以下 SQL Server 的講義
44
Database Systems, 8th Edition
45
Check the differences in query plan:
1.Before UPDATE STATISTICS QOVENDOR
2.After UPDATE STATISTICS QOVENDOR
3. CREATE INDEX QOV_NDX1 on QOVENDOR (V_AREACODE)
UPDATE STATISTICS QOVENDOR
4. CREATE INDEX QOV_NDX2 on QOVENDOR (V_NAME)
UPDATE STATISTICS QOVENDOR
Database Systems, 8th Edition
46
Database Systems, 8th Edition
47
Database Systems, 8th Edition
48
Database Systems, 8th Edition
49
Database Systems, 8th Edition
50
Check the differences in query plan:
1.Before UPDATE STATISTICS QOPRODUCT
2.After UPDATE STATISTICS QOPRODUCT
3. CREATE INDEX QOP_NDX2 ON QOPRODUCT(P_PRICE)
UPDATE STATISTICS QOPRODUCT
Database Systems, 8th Edition
51
Summary
• Database performance tuning
– Refers to activities to ensure query is processed
in minimum amount of time
• SQL performance tuning
– Refers to activities on client side to generate
SQL code
• Returns correct answer in least amount of time
• Uses minimum amount of resources at server end
• DBMS architecture represented by processes
and structures used to manage a database
Database Systems, 8th Edition
53
Summary (continued)
• Database statistics refers to measurements
gathered by the DBMS
– Describe snapshot of database objects’
characteristics
• DBMS processes queries in three phases:
parsing, execution, and fetching
• Indexes are crucial in process that speeds up
data access
Database Systems, 8th Edition
54
Summary (continued)
• During query optimization, DBMS chooses:
– Indexes to use, how to perform join operations,
table to use first, etc.
• Hints change optimizer mode for current SQL
statement
• SQL performance tuning deals with writing
queries that make good use of statistics
• Query formulation deals with translating
business questions into specific SQL code
Database Systems, 8th Edition
55
第 11 章補充
SQL Server 2005 Query Optimizer
善用索引加快查詢效率
(參考實習課課本第 13 章)
本章提要
13-1 索引簡介
13-2 叢集索引與非叢集索引
13-3 Unique 與 Composite 索引
13-4 由系統自動建立的索引
13-5 建立索引的注意事項
13-6 使用 SQL Server Management Studio 管理工具
建立索引
13-8 用 SQL 語法處理索引
13-9 檢視查詢的執行計劃
13-1 索引簡介
在資料庫中查詢一筆記錄時, 如果我們將所有的記錄
一筆一筆做比對, 就如同要在一堆散亂的書籍中找一
本書一樣, 是非常沒有效率的。若是能夠善用索引的
功能, 將記錄依照順序排列整齊, 如此就能夠提高查詢
的效率。
索引雖可加快搜尋的速度, 但並非資料表的每個欄位
都需要建立索引。因為多了索引之後, 當新增、修改、
或刪除記錄時, 除了要將異動存入資料表之外, 伺服器
還必須付出時間來更新索引, 而且索引也會佔用儲存
空間, 因此一般只會建立在經常用來做搜尋的欄位上
(例如經常用在 WHERE 子句中的欄位)。
索引的結構
13-2 叢集索引與非叢集索引
索引可分為叢集索引 (Clustered) 與非叢集索
引 (Non-clustered) 二種。設定叢集索引時, 資
料本身也會依照該索引的順序來存放, 例如一
個資料表中的記錄如下:
叢集索引與非叢集索引
那麼將『ID』欄位設為叢集索引後 此資料表
的內容便會自動依照 ID 的大小來排列:
若是再增加一筆 ID 為 1144 的記錄, 則這筆記
錄會安插在 1023 與 1264 之間。反之, 若在沒
有設定叢集索引的資料表中新增記錄時, 則此
新增記錄會排在其它記錄的後面;也就是說,
記錄會依照輸入的先後順序來排列。
13-3 Unique 與 Composite 索引
不管是叢集索引或非叢集索引, 皆可再依下列二種
標準做分類:
索引值是否唯一:
如果索引值設為唯一 (不可重複), 則稱為唯一索引
(Unique index), 此與資料表的 Primary key 特性類似。
是否只用單一欄位做索引:
如果您使用兩個或多個欄位組合起來做索引, 則稱為複合
索引 (Compos i t e index)。如果複合索引同時也是唯一
索引, 那麼多個欄位組合起來的值就不可重複, 但單獨的
欄位則允許重複。
13-4 由系統自動建立的索引
在 SQL Server 中, 索引不一定要由資料庫設
計者自己建立。如果在建立資料表時, 設定了
Primary key 或 UNIQUE 條件約束, SQL
Server 就會自動幫我們建好索引。
UNIQUE 欄位
Primary key 欄位
UNIQUE 欄位
當資料表中有設為 UNIQUE 的欄位時, 則 SQL
Server 會用此欄位自動建立一個非叢集索引的唯一索
引, 以確保此欄位的唯一性。
查詢索引的分佈情形:
dbcc
show_statistics(QOPRODUCT,PK__QOPRODUCT__07F6335A) ;
檔案群或資料分割配置名稱:
指定索引要存放在哪一個檔案群組 (File group)。在
建立資料庫 (CREATE DATABASE) 時, 每個資料庫都
有一個預設的檔案群組稱為 PRIMARY, 另外使用者也
可以自行定義新的檔案群組。
Primary key 欄位
當資料表中有設定 Primary key (主索引鍵) 時,
則 SQL Server 會在 Primary key 欄位建立一
個叢集索引
13-5 建立索引的注意事項
在資料表中建立索引可以提高搜尋資料的效率,
但是建立索引是有一些限制要注意的。以下是
SQL Server 對索引的限制條件:
一個資料表中只能有一個叢集索引, 因為資料表會
依照叢集索引來排列其內的記錄。在必要時,我們
可以將多個欄位組合起來做為叢集索引。
一個索引所使用的欄位最多只能包括 16 個欄位, 而
且 ntext、text、及 image 型別的欄位不能做為索
引。
13-6 使用 SQL Server Management
Studio 管理工具建立索引
使用 SQL Server Management Studio
管理工具建立索引
使用 SQL Server Management Studio
管理工具建立索引
使用 SQL Server Management Studio
管理工具建立索引
使用 SQL Server Management Studio
管理工具建立索引
13-8 用 SQL 語法處理索引
這一節我們要使用 SQL 語法來建立、重建與
刪除索引。
建立索引的語法
刪除索引
修改或重建索引 (skip)
建立索引的完整語法(參考一下)
刪除索引
不可刪除的索引
資料表中有些特殊的索引是不能用上述語法刪
除的, 那就是在資料表中設定 Primary key 或
UNIQUE 條件約束時, 由 SQL Server 自動產
生的索引。
13-9 檢視查詢的執行計劃
如果想要看看在建立索引之後, 實際執行查詢
時是否有使用到索引, 可以在輸入查詢的 SQL
敘述後, 按下工具列的顯示估計執行計劃鈕 ,
便能檢視查詢的方式。我們以 13-6 節建立在
員工資料表的姓名索引為例。筆者輸入以下敘
述:
檢視查詢的執行計劃
敘述輸入完成後先不要執行查詢, 請按下工具
列的顯示估計執行計劃鈕 :
檢視查詢的執行計劃
如果我們將員工資料表中的索引全部刪除, 然
後再執行以上的查詢時:
使用索引加快查詢
我們以正課課本的例子來展示。
查詢選項請設定 statistics time 及 statistics io 如下,
再比較有加 V_NAME 與 P_PRICE 的索引後,p.444 及 p. 448
之 SQL 查尋效率: