Slide 1

Transcript Slide 1

Session: C05
DB2 9.7 Performance Update
presented by Serge Rielau
IBM Toronto Lab
Tuesday October 6th •11:00
Platform: DB2 for Linux, Unix, Windows
Agenda
• Basics
• Benchmarks
• DB2 9.7 Performance Improvements
– Compression
– Index Compression
– Temp Table Compression
– XML Compression
– Range Partitioning with
local indexes
–
–
–
–
–
Scan Sharing
XML in DPF
Statement Concentrator
Currently Committed
LOB Inlining
• Summary
2
© 2009 IBM Corporation
Basics – Platform / OS
• The basic fundamentals have not changed
• You still want/need a balanced configuration (I/O,
Memory, CPU)
• We recommend 4GB-8GB RAM / core
•
6-20 disks per core where feasible
• Use recommended generally available 64-bit OS
• Applies to Linux, Windows, AIX, Solaris, HP-UX
•
e.g. AIX 5.3 TL09, AIX 6.1 TL03, SLES10
SP2, RHEL 5.3 etc
• All performance measurements/assumptions are
with a 64-bit DB2 server
• Clients can be 32-bit or 64-bit or mixed
•
Even LOCAL clients
3
© 2009 IBM Corporation
Basics - Storage
•
Disk spindles still matter
•
•
•
Be leery of Storage Administrators
that tell you
•
•
•
With sophisticated storage
subsystems and storage
virtualization it just requires more
sleuthing than ever to find them
Drives keep getting bigger,
146GB-300GB now the norm
“Don’t worry, it doesn’t matter”
“The cache will take care of it”
Make the Storage Administrator
your best friend!
•
Take them out for lunch/dinner,
whatever it takes!
4
© 2009 IBM Corporation
Benchmarks
DB2 is the performance leader
TPoX
5
© 2009 IBM Corporation
World Record Performance With TPC-C
7,200,000
6,085,166
6,200,000
tpmC
5,200,000
4,033,378
4,200,000
3,210,540
3,200,000
2,200,000
1,200,000
64x 1.9GHz
POWER5
2 TB RAM
6400 disks
64x 2.3GHz
POWER5+
2 TB RAM
6400 disks
64x 5GHz
POWER6
4 TB RAM
10,900 disks
•Single Machine
•Single Operating
System
•Single Database
200,000
DB2 8.2 on 64-way POWER5
DB2 9.1 on 64-way POWER5+
DB2 9.5 on 64-way POWER6
• Higher is
better
TPC Benchmark, TPC-C, tpmC, are trademarks of the Transaction Processing Performance Council.
• DB2 8.2 on IBM System p5 595 (64 core POWER5 1.9GHz): 3,210,540 tpmC @ $5.07/tpmC available: May 14, 2005
• DB2 9.1 on IBM System p5 595 (64 core POWER5+ 2.3GHz): 4,033,378 tpmC @ 2.97/tpmC available: January 22, 2007
• DB2 9.5 on IBM POWER 595 (64 core POWER6 5.0GHz): 6,085,166 tpmC @ 2.81/tpmC available: December 10, 2008
6
Results current as of September 6, 2009
© 2009 IBM Corporation
Check http://www.tpc.org
for latest results
World Record TPC-C Performance on x64 with
RedHat Linux
1,420,000
1,220,000
1,200,632
•Single Machine
tpmC
1,020,000
841,809
820,000
IBM x3950 M2
•Single Operating
System
•Single Database
Intel Xeon7460
IBM x3950 M2
620,000
RHEL 5.2
420,000
Intel Xeon7350
Win2003
220,000
DB2 9.5
SQL Server 2005
• Higher
is better
TPC Benchmark, TPC-C, tpmC, are trademarks of the Transaction Processing Performance Council.
• DB2 9.5 on IBM System x3950 M2 (8 Processor 48 core Intel Xeon 7460 2.66GHz): 1,200,632 tpmC @ $1.99/tpmC available: December 10,
2008
• SQL Server 2005 on HP DL580G5G4 (8 Processor 32 core Intel Xeon 7350 2.93GHz): 841,809 tpmC @$3.46/tpmC available: April 1, 2008
7
Results current as of September 6, 2009
© 2009 IBM Corporation
Check http://www.tpc.org
for latest results
World record 10 TB TPC-H result on IBM Balanced
Warehouse E7100
IBM System p6 570 & DB2 9.5 create top 10TB TPC-H performance
360,000
•Significant proof-point for the IBM
Balanced Warehouse E7100
343551
QphH
300,000
240,000
•DB2 Warehouse 9.5 takes DB2
performance on AIX to new levels
208457
180,000
120,000
108099
60,000
0
IBM p6 570/DB2 9.5
HP Integrity Superdome-DC Itanium/Oracle 11g
Sun Fire 25K/Oracle 10g
•65% faster than Oracle 11g best
result
•Loaded 10TB data @ 6 TB / hour
(incl. data load, index creation,
runstats)
• Higher
is better
TPC Benchmark, TPC-H, QphH, are trademarks of the Transaction Processing Performance Council.
•DB2 Warehouse 9.5 on IBM System p6 570 (128 core p6 4.7GHz), 343551 QphH@10000GB, 32.89 USD per QphH@10000GB available:
April 15, 2008
•Oracle 10g Enterprise Ed R2 w/ Partitioning on HP Integrity Superdome-DC Itanium 2 (128 core Intel Dual Core Itanium 2 9140 1.6 GHz),
208457 QphH@10000GB, 27.97 USD per QphH@10000GB, available: September 10, 2008
•Oracle 10g Enterprise Ed R2 w/ Partitioning on Sun Fire E25K (144 core Sun UltraSparc IV+ - 1500 MHz): 108099 QphH
@53.80 USD per QphH@10000GB available: January 23, 2006
8
Results current as of September 6, 2009
© 2009 IBM Corporation
Check http://www.tpc.org
for latest results
World record SAP 3-tier SD Benchmark
Top SAP SD 3-tier Results by DBMS Vendor
180000
160000
140000
120000
SD Users
• This benchmark represents a
3 tier SAP R/3 environment in
which the database resides
on its own server where
database performance is the
critical factor
168300
100000
100000
93000
80000
60000
40000
20000
0
DB2 8.2 on 32way p5 595
Oracle 10g on 64way HP Integrity
SQL Server on 64-way HPIntegrity
• Higher
9
is better
Results current as of September 6, 2009
© 2009 IBM Corporation
Check http://www.sap.com/benchmark for latest
results
TPoX
Customer
1
n
Account
11
n
Holding
n
1
Transaction Processing over
XML Data
n
Order
Open Source Benchmark:
http://tpox.sourceforge.net/
Online Brokerage scenario
based on standardized FIXML
schema
FIXML: financial industry XML
Schema for security trading
CustAcc: modeled after a real
banking system that uses XML
Security: information similar to
investment web sites
CustAcc.xsd
1
n
FIXML
(41 XSD files)
1
Security
Security.xsd
TPoX 2.0 Benchmark
•
•
•
•
•
Scale Factor “M”, 1 TB raw data
50M CustAcc XML docs; 2-23 KB
500M Order XML docs; 1-2 KB
20,833 Securities XML docs; 2-9 KB
3 Tables + XML Indexes
10
© 2009 IBM Corporation
XML Transaction Processing with DB2 on IBM JS43 Blade
• TPoX 2.0
TPoX Mixed Workload
• DB2 compression ratio at 58%
• 1TB raw, 1.4 TB database w/index,
604 GB compressed database
• Mixed workload (70% Queries, 30%
Insert/Update/Delete) achieves
TPoX Transaction Per Second
4200
4107
4100
4000
3900
3800
3700
3600
3500
DB2 V9.5
• 4,107 XML tx/sec
(246,420 tx/min, 14.7M tx/hr)
• 5,119 Customer docs/sec, avg. size 6.6KB
(18.2M doc/hour, 120 GB/hour)
13945
14000
• 300 concurrent connections
13500
Documents Per Second
• 13,945 Orders docs/sec, avg. size 1.5KB
(50M doc/hour, 75 GB/hour)
DB2 V9.7
Document Injection Rate
• Avg. response time 0.07 sec
• Insert-only workload inserts
3987
13000
12810
12500
12000
11500
11000
10500
10000
DB2 V9.5
DB2 V9.7
11
© 2009 IBM Corporation
XML Transaction Processing with DB2
DB2 9.7 on Intel® Xeon® Processor X5570 delivers
• Outstanding out-of-the-box performance TPoX benchmark 1.3 results
• Default db2 registry, no db or dbm configuration changes
• Excellent performance scalability of 1.78x from Intel® Xeon® Processor
5400 Series to Intel® Xeon® processor 5500 Series (2-socket, quad-core)
• Performance per Watt improves by 1.52x
733.27
400,000
300,000
308,384
172,993
200,000
100,000
0
DB2 9.7
Intel® Xeon® Processor 5400 Series
Intel® Xeon® Processor 5500 Series
800
TPoX TPM /
Watt
TPoX TPM
•
600
481.45
400
200
0
DB2 9.7
Intel® Xeon® Processor 5400 Series
Intel® Xeon® Processor 5500 Series
12
© 2009 IBM Corporation
Performance Improvements
• DB2 9.7 has tremendous new capabilities that can
substantially improve performance
• When you think about the new features …
• “It depends”
• We don’t know everything (yet)
• Your mileage will vary
• Please provide feedback!
13
© 2009 IBM Corporation
Upgrading to DB2 9.7
• You can directly upgrade to DB2 9.7 from
• DB2 8.2, DB2 9.1, DB2 9.5
• You can expect overall performance to be similar to better (0%-15%
improvement) without exploiting new features
• Your individual “mileage” will vary by
• Platform
• Workload
• CPU utilization
• Upgraded databases retain their basic configuration
characteristics.
• New databases have new default behavior
• E.g. monitoring, currently committed
14
© 2009 IBM Corporation
Process/Thread
Organization
Per-instance
DB2 Threaded Architecture
Per-application
Per-database
Idle, pooled agent or
subagent
UDB Client Library
Single, Multi-threaded Process
db2sysc
Idle Agent Pool
Instance Level
Commo
n
TCPIP (remote clients) or Shared Memory & Semaphores (local clients)
Client
UDB Server
Listeners
db2agent (idle)
db2tcpcm
db2ipccm
db2agent
Coordinator
Agents
Database Level
db2agntp
db2agntp
Subagents
Active
Logging
Subsyste
m
db2loggr
db2loggw
Log
Disks
Buffer Pool(s)
Log Buffer
Deadlock
Detector
db2dlock
Idle
Prefetche
rs
db2pfchr
Page
Cleaners
db2pclnr
Data Disks
15
© 2009 IBM Corporation
Performance Advantages of the Threaded Architecture
• Context switching between threads is generally faster than between
processes
• No need to switch address space
• Less cache “pollution”
• Operating system threads require less context than processes
• Share address space, context information (such as uid, file handle
table, etc)
• Memory savings
• Significantly fewer system file descriptors used
• All threads in a process can share the same file descriptors
• No need to have each agent maintain its own file descriptor table
16
© 2009 IBM Corporation
From the existing DB2 9 Deep Compression …
“With DB2 9, we’re seeing compression rates up to 83% on the Data
Warehouse. The projected cost savings are more than $2 million initially
with ongoing savings of $500,000 a year.” - Michael Henson
“We achieved a 43 per cent saving in total storage requirements when using DB2 with
Deep Compression for its SAP NetWeaver BI application, when compared with the former
Oracle database, The total size of the database shrank from 8TB to 4.5TB, and
response times were improved by 15 per cent. Some batch applications and change
runs were reduced by a factor of ten when using IBM DB2.” - Markus Dellermann
• Reduce storage costs
• Improve performance
1.5 Times
Better
3.3 Times
Better
2.0 Times
Better
8.7 Times
Better
• Easy to implement
DB2 9
Other
17
© 2009 IBM Corporation
Index Compression
What is Index Compression?
•
•
•
The ability to decrease the storage
requirements from indexes through
compression.
By default, if the table is
compressed the indexes created
for the table will also be
compressed.
• including the XML indexes
Index compression can be
explicitly enabled/disabled when
creating or altering an index.
Why do we need Index Compression?
• Index compression reduces disk cost
and TCO (total cost of ownership)
•
Index compression can improve
runtime performance of queries that
are I/O bound.
When does Index Compression work
best?
•
•
Indexes for tables declared in a
large RID DMS tablespaces
(default since DB2 9).
Indexes that have low key
cardinality & high cluster ratio.
18
© 2009 IBM Corporation
Index Compression
Index Page (pre DB2 9.7)
Page Header
Fixed Slot Directory (maximum size reserved)
AAAB, 1, CCC
1055, 1056
AAAB, 1, CCD
3011, 3025, 3026, 3027, 3029, 3033, 3035, 3036, 3037
BBBZ, 1, ZZZ
3009, 3012, 3013, 3015, 3016, 3017, 3109
BBBZ, 1, ZZCCAAAE
6008, 6009, 6010, 6011
How does Index
Compression Work?
Index Key
RID List
• DB2 will consider multiple
compression algorithms to
attain maximum index
space savings through
index compression.
19
© 2009 IBM Corporation
Index Compression
Index Page (DB2 9.7)
Page Header
Variable Slot
Directory
Saved Space from
Variable Slot Directory
AAAB, 1, CCC
1055, 1056
AAAB, 1, CCD
3011, 3025, 3026, 3027, 3029, 3033, 3035, 3036, 3037
BBBZ, 1, ZZZ
3009, 3012, 3013, 3015, 3016, 3017, 3109
BBBZ, 1, ZZCCAAAE
6008, 6009, 6010, 6011
Variable Slot Directory
Index Key
RID List
• In 9.7, a slot directory is
dynamically adjusted in order
to fit as many keys into an
index page as possible.
20
© 2009 IBM Corporation
Index Compression
Index Page (DB2 9.7)
Page Header
Variable Slot
Directory
Saved Space from
Variable Slot Directory
First RID
AAAB, 1, CCC
1055, 1 Saved
AAAB, 1, CCD
3011, 14, 1, 1, 2, 4, 2, 1, 1
BBBZ, 1, ZZZ
3009, 3, 1, 2, 1, 1, 92
BBBZ, 1, ZZCCAAAE
6008, 1, 1, 1
RID Deltas
Saved from RID List
Saved
Saved
RID List Compression
Index Key
Compressed
RID
• Instead of saving the full version of a
RID, we can save some space by
storing the delta between two RIDs.
• RID List compression is enabled when
there are 3 or more RIDs in an index
page.
21
© 2009 IBM Corporation
Index Compression
COMMON
PREFIX
Index Page (DB2 9.7)
Page Header
Variable Slot
Directory
C 1055, 1
AAAB, 1, CC
BBBZ, 1, ZZ
SUFFIX
RECORDS
Saved Space from
Variable Slot Directory
0,
2
Saved
D 3011, 14, 1, 1, 2, 4, 2, 1, 1
Saved from RID List and Prefix Compression
Z 3009, 3, 1, 2, 1, 1, 92
CCAAAE
6008, 1, 1, 1
Saved
Saved
Prefix Compression
Compressed
Key
Compressed
RID
• Instead of saving all key values, we can save some
space by storing a common prefix and suffix records.
• During index creation or insertion, DB2 will compare
the new key with adjacent index keys and find the
longest common prefixes between them.
22
© 2009 IBM Corporation
Index Compression
Results in a Nutshell
Complex Query Database
Warehouse Tested
Estimated Index Compression Savings
Warehouse #7
• Index compression uses idle CPU
cycles and idle cycles spent waiting
for I/O to compress & decompress
index data.
57%
Warehouse #6
55%
Warehouse #5
50%
Warehouse #4
31%
Warehouse #3
Average 36%
24%
Warehouse #2
20%
Warehouse #1
16%
0%
10%
20%
30%
40%
50%
60%
70%
• When we are not CPU bound, we are
able to achieve better performance in
all inserts, deletes and updates.
Percentage Compressed (Indexes)
* Higher is better
Simple Index Compression Tests
Machine Utilization
100%
11.7
11.4
33.3
80%
37.1
25.9
30.9
38.0
20%
34.2
Simple Update
17.5
49.1
48.2
46.3
2.5
34.5
34.8
1.6
16.2
2.0
2.6
20.8
23.6
Insert: Insert:
Base Ixcomp
user
system
Update: Update:
Base Ixcomp
idle
iowait
52.2
33.9
0%
Select: Select:
Base Ixcomp
Runs
18% Faster
44.07
53.89
45.0
16.7
Runs
16% Faster
28.31
33.67
Simple Delete
18.5
36.4
60%
40%
Simple Index Compression Tests - Elapsed Time
3.1
6.8
52.1
Simple Insert
3.3
10.5
Simple Select
Delete: Delete:
Base Ixcomp
68.3
Runs
19% Faster
83.99
49.12
49.24
0
10
20
Runs
As fast
30
40
50
60
70
80
90
Seconds
Without Index Compression
With Index Compression
• Lower is better23
© 2009 IBM Corporation
Temp Table Compression
What is Temp Table Compression?
•
•
The ability to decrease storage
requirements by compressing temp
table data
Temp tables created as a result of
the following operations are
compressed by default:
•
•
•
•
Temps from Sorts
Created Global Temp Tables
Declared Global Temp Tables
Table queues (TQ)
Why do we need Temp Table
Compression on relational
databases?
•
Temp table spaces can account
for up to 1/3 of the overall
tablespace storage in some
database environments.
•
Temp compression reduces disk
cost and TCO (total cost of
ownership)
24
© 2009 IBM Corporation
Temp Table Compression
How does Temp Table Compression Work?
• It extends the existing row-level compression mechanism that
currently applies to permanent tables, into temp tables.
String of data across a
row
Canada|Ontario|Toronto|Matthew
Canada|Ontario|Toronto|Mark
USA|Illinois|Chicago|Luke
USA|Illinois|Chicago|John
Lempel-Ziv Algorithm
Create dictionary from sample data
0x12f0
0xe57a
0xff0a
0x15ab
0xdb0a
0x544d
–
–
–
–
–
–
CanadaOntarioToronto …
Mathew …
Mark …
USAIllinoixChicago …
Luke …
John …
Saved data
(compressed)
0x12f0,0xe57a
0x12f0,0xff0a
0x15ab,0xdb0a
0x15ab,0x544d
25
© 2009 IBM Corporation
Temp Table Compression
Query Workload CPU Analysis for Temp
Compression
100%
14.61
22.19
Results in a Nutshell
Effective
CPU
Usage
• For affected temp compression
enabled complex queries, an average
of 35% temp tablespace space
savings was observed. For the
100GB warehouse database setup,
this sums up to over 28GB of saved
temp space.
80%
29.50
60%
29.00
1.3
40%
20%
1.7
46.50
39.26
0%
Baseline
Index Compression
user
100.0
sys
idle
iowait
Space Savings for Complex Warehouse Queries with Temp
Compression
200.00
Elapsed Time for Complex Warehouse Queries with Temp
Compression
190.00
Saves
35%
Space
60.0
40.0
78.3
50.2
20.0
5%
Faster
180.00
Minutes
Size (Gigabytes)
80.0
170.00
160.00
150.00
183.98
175.56
140.00
130.00
0.0
120.00
Without Temp Comp Total Bytes Stored
With Temp Comp Bytes Stored
• Lower is better
Without Temp Comp Runtime
With Temp Comp Runtime
• Lower is better
26
© 2009 IBM Corporation
XML Data Compression
What is XML Data Compression?
Why do we need XML Data
Compression?
• The ability to decrease the storage
requirements of XML data through
compression.
•
Compressing XML data can improve
storage efficiency and runtime
performance of queries that are I/O
bound.
•
XML compression reduces disk cost and
TCO (total cost of ownership) for
databases with XML data
• XML Compression extends row
compression support to the XML
documents.
• If row compression is enabled for
the table, the XML data will be also
compressed. If row compression is
not enabled, the XML data will not
be compressed either.
27
© 2009 IBM Corporation
XML Data Compression
How does XML Data Compression Work?
Data (uncompressed)
• Small XML documents (< 32k) can be
inlined with any relational data in the
row and the entire row is compressed.
• Larger XML documents that reside in a
data area separate from relational data
can also be compressed. By default,
DB2 places XML data in the XDA to
handle documents up to 2GB in size.
• XML compression relies on a separate
dictionary than the one used for row
compression.
Relational
Data
< 32KB
XML Data
32KB – 2GB
XML Data
Data (compressed)
Comp.
Data
Inlined
< 32KB
XML Data
Dictionary
#1
Compressed
32KB – 2GB
XML Data
Dictionary
#2
28
© 2009 IBM Corporation
XML Data Compression
XML Compression Savings
• Significantly improved query
performance for I/O-bound
workloads.
XML Database Tested
Results in a Nutshell
XML DB Test #7
77%
XML DB Test #6
77%
XML DB Test #5
74%
XML DB Test #4
63%
XML DB Test #3
63%
XML DB Test #2
43%
XML DB Test #1
• Achieved 30% faster
maintenance operations
such as RUNSTATS, index
creation, and import.
0%
20%
40%
60%
80%
Percentage Compressed
• Higher is better
Average Elapsed Time for SQLXML and Xquery Queries over an XML
and Relational Data database using XDA Compression
35
37%
Faster
30
25
Time (sec)
• Average compression
savings of ⅔ across 7
different XML customer
databases and about ¾
space savings for 3 of those
7 databases.
Average 67%
61%
20
15
31.1
19.7
10
5
0
Without XML Compression
With XML Compression
• Lower is better
29
© 2009 IBM Corporation
Range Partitioning with Local Indexes
What does Range Partitioning
with Local Indexes mean?
• A partitioned index is an index
which is divided up across
multiple storage objects, one
per data partition, and is
partitioned in the same
manner as the table data
• Local Indexes can be created
using the PARTITIONED
keyword when creating an
index on a partitioned table
(Note: MDC block indexes are
partitioned by default)
Why do we need Range
Partitioning with local
Indexes?
• Improved ATTACH and DETACH
partition operations
• More efficient access plans
• More efficient REORGs.
When does Range Partitioning
with Local Indexes work best?
• When frequents roll-in and rollout of data are performed
• When one tablespace is defined
per range.
30
© 2009 IBM Corporation
Range Partitioning with Local Indexes
Total Time and Log Space required to ATTACH 1.2 million rows
Log Space required (MB)
• Partition maintenance with ATTACH:
• 20x speedup compared to DB2
9.5 global index because of
reduced index maintenance.
• 3000x less log space used than
with DB 9.5 global indexes.
Log Space used,
MB
Attach/Set Integrity
time (sec)
651.84
1.E+03
1.E+02
160.00
140.00
120.00
1.E+01
100.00
80.00
1.E+00
60.00
0.21
1.E-01
0.05
40.00
0.03
1.E-02
20.00
0.00
V9.5 Global
Indexes
Cobra Local
Indexes built
during ATTACH
• Asynchronous index maintenance on
DETACH is eliminated.
Cobra Local
Indexes built
before ATTACH
No Indexes Baseline
* Lower is better
Local Indexes
Index size comparison: Leaf page count
20,000
25%
Space
Savings
16,000
Index leaf pages
• Local indexes occupy fewer disk pages
than 9.5 global indexes.
• 25% space savings is typical.
• 12% query speedup over global
indexes for index queries – fewer
page reads.
180.00
Attach/Set Integrity time (sec)
Results in a Nutshell
12,000
18,409
8,000
13,476
4,000
0
global index on RP table
local index on RP table
• Lower is better 31
© 2009 IBM Corporation
Scan Sharing
What is Scan Sharing?
• It is the ability of one scan to exploit
the work done by another scan
• This feature targets heavy scans
such as table scans or MDC block
index scans of large tables.
• Scan Sharing is enabled by default
on DB2 9.7
Why do we need Scan Sharing?
• Improved concurrency
• Faster query response times
• Increased throughput
When does Scan Sharing work
best?
• Scan Sharing works best on
workloads that involve several
clients running similar queries
(simple or complex), which involve
the same heavy scanning
mechanism (table scans or MDC
block index scans).
32
© 2009 IBM Corporation
Scan Sharing
Unshared Scan
How does Scan Sharing work?
• When applying scan sharing, scans
may start somewhere other than the
usual beginning, to take advantage of
pages that are already in the buffer
pool from scans that are already
running.
• When a sharing scan reaches the end
of file, it will start over at the
beginning and finish when it reaches
the point that it started.
• Eligibility for scan sharing and for
wrapping are determined
automatically in the SQL compiler.
1
2
3
4
5
6
7
8
1
2
3
4
5
A
scan
6
7
8
B
scan
Re-read pages
causing extra I/O
Shared Scan
1
2
A
scan
3
4
5
6
7
Shared
A & B scan
8
1
2
3
B
scan
• In DB2 9.7, scan sharing is supported
for table scans and block index
scans.
33
© 2009 IBM Corporation
Scan Sharing
Block Index Scan Test : Q1 and Q6 Interleaved
N o S can S har ing
• MDC Block Index Scan Sharing
shows 47% average query
improvement gain.
Q 1
Q 6
Q 1
Query R an
st agger i ng ever y 10 sec
Q 6
• The fastest query shows up to
56% runtime gain with scan
sharing.
Q 1
Q 6
Q 1
Q 6
Q 1
Q 6
Q 1
Q1 : CPU Intensive
Q6 :
IO Intensive
Q 6
Q 1
Q 6
Q 1
Q 6
Q 1
Q 6
0
50
100
150
200
250
300
350
400
450
500
550
600
S can S har ing
Q 1
Q 6
Query R an
st agger i ng ever y 10 sec
Q 1
Q 6
Q 1
Runs
47%
Faster!
Q 6
Q 1
Q 6
Q 1
Q 6
Q 1
Q 6
Q 1
Q 6
Q 1
Q 6
Q 1
Q 6
Scan Sharing Tests on Table Scan
1,400.0
0
50
100
150
200
250
300
350
400
450
500
550
600
1,284.6
1,200.0
• Lower is better
Seconds
1,000.0
Runs
14x
Faster!
800.0
600.0
400.0
90.3
200.0
0.0
No Scan Sharing
Scan Sharing
Average of running 100 Instances of Q1
• 100 concurrent table scans
now run 14 times faster
with scan sharing!
• Lower is better
34
© 2009 IBM Corporation
Scan Sharing
Results in a Nutshell
• When running 8 concurrent streams of complex
queries in parallel on a 10GB database warehouse,
a 15% increase in throughput is attained when using
scan sharing.
Throughput for a 10GB Warehouse Database:
8 Parallel Streams
15%
Throughput
Improved
400
300
200
391.72
339.59
100
0
Scan Sharing OFF
Scan Sharing ON
• Higher is better
35
© 2009 IBM Corporation
XML Scalability on Infosphere Warehouse (a.k.a DPF)
What does it mean?
• Tables containing XML
column definitions can now be
stored and distributed on any
partition.
• XML data processing is
optimized based on their
partitions.
Why do we need XML in database partitioned environments?
•
As customers adopt the XML datatype in their warehouses, XML data
needs to scale just as relational data
•
XML data also achieves the same benefit from performance
improvements attained from the parallelization in DPF environments.
36
© 2009 IBM Corporation
XML Scalability on Infosphere Warehouse (a.k.a DPF)
Simple query: Elapsed time speedup from 4 to 8 partitions
rel
xml
Results in a Nutshell
xmlrel
2
1.5
*
1
0.5
0
count w ith
index
count, no
index
grouped agg
update
colo join
noncolo join
Complex query: Elapsed time speedup from 4 to 8 partitions
3
rel
xml
• Table results show the elapsed time
performance speedup of complex
queries from a 4 partition setup to an
8 partition setup. Queries tested
have a similar star-schema balance
for relational and XML.
• Each query run in 2 or 3 equivalent
variants:
• Completely relational (“rel”)
• Completely XML (“xml”)
• XML extraction/predicates with
relational joins (“xmlrel”) (join
queries only)
3.5
Elapsed time 4P / 8P
Elapsed time 4P / 8P
2.5
xmlrel
2.5
2
1.5
• Queries/updates/deletes scale as
well as relational ones.
1
0.5
0
1
2
3
4
5
6
Query number
7
8
9
10
• Average XML query-speedup is 96%
of relational
37
© 2009 IBM Corporation
Statement Concentrator
What is the statement
concentrator?
•
It is a technology that allows
dynamic SQL statements
that are identical, except for
the value of its literals, to
share the same access plan.
•
The statement concentrator
is disabled by default, and
can be enabled either
through the database
configuration parameter
(STMT_CONC) or from the
prepare attribute
Why do we need the statement
concentrator?
• This feature is aimed at OLTP workloads
where simple statements are repeatedly
generated with different literal values. In
these workloads, the cost of recompiling
the statements many times adds a
significant overhead.
• Statement concentrator avoids this
compilation overhead by allowing the
compiled statement to be reused,
regardless of the values of the literals.
38
© 2009 IBM Corporation
Statement Concentrator
Effect of the Statement Concentrator on Prepare
times for 20,000 statements using 20 users
Results in a Nutshell
500
Prepare Time (sec)
436
400
19x
Reduction
in Prepare
time!
300
200
100
23
• The statement
concentrator allows
prepare time to run up to
25x faster for a single user
and 19x faster for 20
users.
0
Concentrator off
Concentrator on
• Lower is better
Effect of the Statement Concentrator for an OLTP workload
200
180
35%
Throughput
Improved!
180
160
Throughput
140
133
120
100
• The statement
concentrator improved
throughput by 35% in a
typical OLTP workload
using 25 users
80
60
40
20
• Higher is better
0
Concentrator Off
Concentrator On
39
© 2009 IBM Corporation
Currently Committed
What is Currently Committed?
•
Currently Committed semantics
have been introduced in DB2 9.7
to improve concurrency where
readers are not blocked by
writers to release row locks
when using Cursor Stability (CS)
isolation.
•
The readers are given the last
committed version of data, that
is, the version prior to the start of
a write operation.
•
Currently Committed is
controlled with the
CUR_COMMIT database
configuration parameter
Why do we need the Currently
Committed feature?
•
Customers running high
throughput database
applications cannot tolerate
waiting on locks during
transaction processing and
require non-blocking behavior for
read transactions.
40
© 2009 IBM Corporation
Currently Committed
CPU Analysis - CPU Analysis on Currently Committed
100%
8.7
19.0
Results in a Nutshell
80%
Effective
CPU
usage
5.0
33.5
17.2
• By enabling currently
committed, we use CPU
that was previously idle
(18%), leading to an
increase of over 28% in
throughput.
60%
12.9
40%
58.9
20%
45.0
0%
CC Disabled
CC Enabled
user
system
idle
iowait
Throughput of OLTP Workload using Currently
Committed
Transactions per second
1,500
Allows
28% more
throughput
1,200
900
1,260.89
600
981.25
300
0
Currently Commit Disabled
Currently Commit Enabled
• Higher is better
• With currently committed
enabled, we see reduced
LOCK WAIT time by
nearly 20%.
• We observe expected
increases in LSN GAP
cleaners and increased
logging.
41
© 2009 IBM Corporation
LOB Inlining
What is LOB INLINING?
•
LOB inlining allows customers to
store LOB data within a formatted
data row in a data page instead of
creating separate LOB object.
•
Once the LOB data is inlined into
the base table row, LOB data is
then eligible to be compressed.
Why do we need the LOB Inlining
feature?
• Performance will increase for queries
that access inlined LOB data as no
additional I/O is required to fetch the
LOB data.
• LOBS are prime candidates for
compression given their size and the
type of data they represent. By
inlining LOBS, this data is then eligible
for compression, allowing further
space savings and I/O from this
feature.
42
© 2009 IBM Corporation
LOB Inlining
Results in a Nutshell
•
Inlined LOB vs. Non-Inlined LOB
80%
70%
60%
50%
40%
75%
75%
64%
30%
70%
55%
65%
20%
30%
22%
10%
7%
32k Lob
16k Lob
0%
8k Lob
•
INSERT and SELECT
operations are the
ones with more
benefit. The smaller
the LOB the bigger the
benefit of the inlining
For UPDATE
operations the larger
the LOB the better the
improvements
We can expect the
inlined LOBs will have
the same performance
as a varchar(N+4)
% Improvement
•
Size of LOB
Insert Performance
Select Performance
Update Performance
* Higher is better
43
© 2009 IBM Corporation
Summary of Key DB2 9.7 Performance Features
• Compression for indexes, temp tablespaces and XML data results on
space savings and better performance
• Range Partitioning with local indexes results in space savings and better
performance including increased concurrency for certain operations like
REORG and set integrity. It also makes roll-in and roll-out of data more
efficient.
• Scan Sharing improves workloads that have multiple heavy scans in the
same table.
• XML Scalability allows customers to exploit the same benefits in data
warehouses as they exist for relational data
• Statement Concentrator improves the performance of queries that use
literals reducing there prepare times
• Currently Committed increases throughput and reduces the contention
on locks
• LOB Inlining allows this type of data to be eligible for compression
44
© 2009 IBM Corporation
A glimpse at the Future
• Preparing for new workloads
• Combined OLTP and Analtytics
• Preparing for new operating environments
• Virtualization
• Cloud
• Power-aware
• Preparing for new hardware
• SSD flash storage
• IBM POWER7
• Intel Nehalem EX
45
© 2009 IBM Corporation
Conclusion
• DB2 is the performance leader
• New features in DB2 9.7 that further boost performance
• For BOTH the OLTP and Data warehouse areas
• Performance is a critical and integral part of DB2!
• Maintaining excellent performance
• On current hardware
• Over the course of DB2 maintenance
• Preparing for future hardware/OS technology
46
© 2009 IBM Corporation
Appendix – Mandatory SAP publication data
Required SAP Information
•
For more information regarding these results and SAP benchmarks, visit www.sap.com/benchmark.
•
These benchmark fully complies with the SAP Benchmark Council regulations and has been audited and certified by SAP AG
SAP 3-tier SD Benchmark:
168,300 SD benchmark users. SAP R/3 4.7. 3-tier with database server: IBM eServer p5 Model 595, 32-way SMP, POWER5 1.9 GHz, 32 KB(D) + 64 KB(I) L1 cache per
processor, 1.92 MB L2 cache and 36 MB L3 cache per 2 processors. DB2 v8.2.2, AIX 5.3 (cert # 2005021)
100,000 SD benchmark users. SAP R/3 4.7. 3-tier with database server: HP Integrity Model SD64A, 64-way SMP, Intel Itanium 2 1.6 GHz, 32 KB L1 cache, 256 KB L2
cache, 9 MB L3 cache. Oracle 10g, HP-UX11i (cert # 2004068)
93,000 SD benchmark users. SAP R/3 4.7. 3-tier with database server: HP Integrity Superdome 64P Server, 64-way SMP, Intel Itanium 2 1.6 GHz, 32 KB L1 cache, 256
KB L2 cache, 9 MB L3 cache . SQL Server 2005, Windows 2003 (cert # 2005045)
SAP 3-tier BW Benchmark:
311,004 throughput./hour query navigation steps.. SAP BW 3.5. Cluster of 32 servers, each with IBM x346 Model 884041U, 1 processor/ 1 core/ 2 threads, Intel XEON
3.6 GHz, L1 Execution Trace Cache, 2 MB L2 cache, 2 GB main memory. DB2 8.2.3 SLES 9. (cert # 2005043)
SAP TRBK Benchmark:
15,519,000. Day processing no. of postings to bank accounts/hour. SAP Deposit Management 4.0. IBM System p570, 4 core, POWER6, 64GB RAM. DB2 9 on AIX 5.3.
(cert # 2007050)
10,012,000 Day processing no. of postings to bank accounts/hour. SAP Account Management 3.0. Sun Fire E6900, 16 core, UltraSPARC1V, 56GB RAM, Oracle 10g on
Solaris 10, (cert # 2006018)
8,279,000 Day processing no. of postings to bank accounts/hour/ SAP Account Management 3.0. HP rx8620, 16 core, HP mx2 DC,64 GB RAM, SQL Server on
Windows Server (cert # 2005052)
SD 2-tier SD Benchmark:
39,100 SD benchmark users, SAP ECC 6.0. Sun SPARC Enterprise Server M9000, 64 processors / 256 cores / 512 threads, SPARC64 VII, 2.52 GHz, 64 KB(D) + 64 KB(I)
L1 cache per core, 6 MB L2 cache per processor, 1024 GB main memory, Oracle 10g on Solaris 10. (cert # 2008-042-1)
35,400 SD benchmark users, SAP ECC 6.0. IBM Power 595, 32 processors / 64 cores / 128 threads, POWER6 5.0 GHz, 128 KB L1 cache and 4 MB L2 cache per core,
32 MB L3 cache per processor, 512 GB main memory. DB2 9.5, AIX 6.1. (Cert# 2008019).
30,000 SD benchmark users. SAP ECC 6.0. HP Integrity SD64B , 64 processors/128 cores/256 threads, Dual-Core Intel Itanium 2 9050 1.6 GHz, 32 KB(I) + 32 KB(D) L1
cache, 2 MB(I) + 512 KB(D) L2 cache, 24 MB L3 cache, 512 GB main memory. Oracle 10g on HP-UX 11iV3. (cert # 2006089)
23,456 SD benchmark users. SAP ECC 5.0. Central server: IBM System p5 Model 595, 64-way SMP, POWER5+ 2.3GHz, 32 KB(D) + 64 KB(I) L1 cache per processor,
1.92 MB L2 cache and 36 MB L3 cache per 2 processors. DB2 9, AIX 5.3 (cert # 2006045)
20,000 SD benchmark users. SAP ECC 4.7. IBM eServer p5 Model 595, 64-way SMP, POWER5, 1.9 GHz, 32 KB(D) + 64 KB(I) L1 cache per processor, 1.92 MB L2 cache
and 36 MB L3 cache per 2 processors, 512 GB main memory. (cert # 2004062)
These benchmarks fully comply with SAP Benchmark Council's issued benchmark regulations and have been audited and certified by SAP. For more information,
see http://www.sap.com/benchmark
47
© 2009 IBM Corporation
Session C05
DB2 9.7 Performance Update
Serge Rielau
IBM Toronto Lab
[email protected]
48