DB2 for UNIX, Windows, OS/2

Download Report

Transcript DB2 for UNIX, Windows, OS/2

DB2 for UNIX, Windows, OS/2
UNIX, Windows, OS/2
Scaling DB2 / NT
Databases For
Explosive Growth
Christi Fortier, MSAS, ICSE, VP Merrill Lynch PCT/QST
Sonia Ang, Certified Senior Consulting IT SpecialistData Management IBM
Session: A4
Tuesday, May 15, 2001
8:30am – 10:00am
 Copyright 1999
Scaling DB2 / NT Databases
For Explosive Growth
Christi Fortier, MSAS,ICSE
VP Merrill Lynch
PCT/QST
Sonia Ang
Certified Senior Consulting
IT Specialist- Data Management, IBM
Technical Thread: UNIX, Windows, OS/2
Session: A4
Tuesday, May 15, 2001
8:30am – 10:00am EST
2
Acknowledgements
David Yeger
ML FVP Private Client/ Quality Service
Technologies
June Langley
ML VP Electronic Messaging and her team
Toronto IBM Labs
Pilot betas, Workarounds, strategy support,
scaling methodologies, Migrations
Microsoft Redmond
Labs
Pilot betas, N-Tier strategy support (esp.
MTS, DTC, TLS)
Intel: Vikram Saletore,
Parviz Peiravi
web notes & slides on VI Architecture
Emulex Giganet
Web notes on CLAN fabrics
Brocade
Web notes on SAN switches
Winter’s Consulting
group
VLDB Survey results
http://www.wintercorp.com/findings.html
Agenda

VLDB History

ML /DB2 EEE 1st Proof of Concept
– Methods of Scaling

16 Node ML Joint Venture Cluster

Reuse of technology to resolve
Production Explosive Growth
– Migration issues DB2 5.2EE to 7.1 EEE
– Complex Data Workarounds

What we learned

Questions & Answers
VLDB History
Presenter:
Christi Fortier VP
Merrill Lynch
5
Where the Industry was in 1998
Winters’ 1998 VLDB Survey
In 1998, is there a correlation between system architecture and database size?
 Databases < 250GB:
SMP and MPP systems are represented in all size ranges while NUMA and cluster systems fall into the
two largest categories
11% SMP, 7% MPP
 Databases > 500GB:
24% MPP, 14% SMP
MPP, Transaction Processing
1 2
MPP, Decision Support
7
4
14
8
 VLDB Issues:
SMP Transaction Processing
6
SMP, Decision Support
2
2
4
5
3
3
>1TB
4
500GB-1TB
5
250GB-499GB
<250GB
NUMA, Decision Support 1 2
Cluster, Transaction Processing 1 1
Cluster, Decision Support 1 1
0
5
10
15
20
N=75
The Survey of the World’s Largest Databases
25
30
35
27% Scalability &
performance
management
20% Workload issues
(capacity planning &
system
performance)
Merrill Lynch Database Growth
Forecast (Private Client Focus)
 Our mainframe databases are
Merrill Lynch PCT Large
Database Growth Forecast
growing linear
Our NT Databases are growing
exponentially
 Our Relational Databases are
Expected
Growth 2 years
MSSQL/NT
DB2/Unix
catching up with Winter’s Corp
2000 VLDB survey below
Expected
Growth by EOY
(GB)
SIZE (GB)
DB2/OS390
DB2/NT
0
2000
4000
Winter’s Consulting VLDB Survey
7
Merrill Lynch PCT VLDB
Architecture Forecast
 Clusters are gaining popularity
Merrill Lynch PCT Database Architecture
Trends Forecast
for databases that are over 50GB
with large growth expectations
(23% vs. VLDB survey 10%)
1. 6
UNI databases growth is linear
1. 4
2 year
Architecture
EOY
Architecture
Current
Architecture
1. 2
1
0. 8
0. 6
0. 4
0. 2
0
cluster
UNI
SMP
Winters Corp 2000 VLDB Survey
System Architecture
8
DB2 Universal Database EEE
Today
Rich Object and query Relational power
MPP and SMP parallelism
Parallelism and Scalability
Everything operates in parallel
Queries-Transactions
Insert/Update/Deletes
Triggers, RI, check constraints
Utilities
Backup/restore
Load, RI, check constraints
Create index
Reorg
AST computation
DB2 UDB EEE Terabyte Customers
for Business Intelligence
Provides Scaling to Larger
Databases and Workloads on
Single SMP, Cluster of SMP, and
MPP systems
Score Card
Terabyte Customers
Largest (TB)
Average Disk (TB)
01/99
102
50
4.4
Insurance
Telecom
Banking
Healthcare
Gov't
Retail
Other
8
20
28
9
6
29
2
9
DB2 EEE INCREDIBLE
SCALABILITY FOR WINDOWS NT
TPC-C BENCHMARKS:
 DB2 UDB EEE V7.1
 32 x 4 Netfinity Servers,
Windows 2000
• 700 mhz Pentium III
 21 billion records
• 3.2 TB raw data, 116TB of
disk space
• 1.2 TB log space
• 0.6TB index
 Hundreds of Thousands of
users!
• 96 Client machines, 2way
pentium 600mhz
• Each simulating approx.
3,840 tpc-c users
• Record HoldingTPC-H at
1TB
NT EXPLOITATION:
 Choice of interconnect
• Token ring, ethernet (TCP/IP)
• gigaNET, Tandem ServerNet
(VI)
• IBM "Redhawk" Switch (future)
 Supports Microsoft Cluster
Server for High Availability
• MSCS pairs make entire
system highly available
 Integration with NT
• Windows NT native thread
model
• NTFS & raw containers
• Windows NT domain security
• Microsoft Transaction Server
• 4GB Memory Tuning
10
ML Justifications for
Clustering Our Databases

Partitioning & Clustering a database can be an
extremely worthwhile effort. Database Clustering
brings to fruition such goals as:
 Achieving load windows with room to spare
 Attaining huge performance gains in diverse
mixtures of workloads (OLAP, OLTP ..)
 Load balancing the database
 Minimizing downtime of users
 Scaling Databases as needed
 Faster DB Maintenance (Reorg, Runstats…)
 HA Opportunities
 Component Based Architectures are
CHEAPER!
Challenges of Component
Based Architecture

Network Latency
 Emulex Giganet CLAN VI Removes Latency
 Total port to port latency of 5μs
 2.5GB/s non Blocking bandwidth (full all ports)

Systems Management
 Using NETIQ for Monitoring
 SMS for Software Distribution
 Database Utility Suite

Development Tools
 Visual Basic, Visual Studio, MTS, DTC, ADO

Developer, DBA and System Administrator Culture
Comfort Zone. Teamwork was essential.
ML /DB2 EEE 1st Proof of Concept
Presenters:
Christi Fortier
Sonia Ang
VP Merrill Lynch
Data Management
IBM
13
First Cluster Proof of Concept 12/1998
DB2 NT Cluster Configuration
DB2 EEE 5.2 fp8
stabile with MTS COM
Components
DASD11
DELL
PowerVault 630
DASD12
DELL
PowerVault 630
Fibre
Loop
Fibre
Loop
Dell 6300 Xeon
Processors 440 MHZ, 1
MB L2 cache
implementation
thruout (100 MB/sec)
Giganet 1st VI
compliant fast enet
LAFORGE
Quad Xeon Server
NT RRA Server
SCOTTY
Quad Xeon Server
Fibre
Loop
OBRIEN
Quad Xeon Server
RIKER
Quad Xeon Server
Fibre
Loop
WESLEY
Quad Xeon Server
 GNN1000 NIC: 2.5GB/s
bi-dir, 1.25GB/s full-duplex
 GNX5000 switch: 20
GB/s full duplex aggregate
DASD21
MTI Gladiator
6300
DASD22
MTI Gladiator
6300
VI: reduces the latency
associated w critical msg
passing operations
SAN - Giganet
LAN - Ethernet
 1st Fibre RAID
Geographic Key extract for
partitioning (before VI)
Table
Size
Rows
Generate
Sub partition
extract script
R1
685MB
1.4
mil
P1
3.8GB
16.8
mil
Extract, convert &
add subpartition
p2
5GB
44.1
mil
Partitioned?
<default>
nsub
…
133
tables
Raw
Input
500
GB
Converted
Converted
with
subpartition
Analysis
Databas
e
Sub-process
comparisons
(timestamp,
initialize parms)
Encouraging Co-Located Joins
Partitioned Tables
Replicate Tables
Portion
Replicate
of Table Image on each
database node
Portion
controlled by Partitioning
Key in concert with Partitioning
Map
Joins
with partition keys in
predicates are co-located
TI-n=Table Image partition on node N
base table created on
one node, but replicated to all other
database partitions
Allow
for co-location of joins
instead of broadcasting the smaller
table to other nodes
CREATE
SUMMARY TABLE
statement with the REPLICATED
keyword
Parallel Nodes
n0
n1
n2
n3
Replicated Versions
Co-located Join
TI-0
TI-1
TI-2
Partitioning Map
TI-3
Primary
Table Image
16
DB2 & Virtual Interface Protocol
Benchmark TPS
Reorg,
Runstats
VI: OLTP
Broadcast
join (P1 w
P2 w single
node R1)
TCP: OLTP
Broadcast
join (P1 w
P2 w single
node R1)
Architecture
20%
4 DB nodes
improve
ment w VI
289 TPS
@ 100
users
4web->4app>DB (VI on
at DB)
299 TPS
@ 200
users
258 TPS
@ 100
users
277 TPS
@ 200
users
4web->4app>DB (VI
off/TCP on at
DB)
With VI, the
performance impacts
of broadcasts are
minimized.
• 12 to 40% +
queries w/o
geographic
partition key
with restricted
broadcasts.
• 40% w
complicated
unrestricted
queries
The more nodes and
more users the
greater the VI
benefits
17
Pilot LOAD TIMES & CPU USE
DB2 EEE 4-way Load – 4 GB Table
Load Times Per Logical Node
Node
Number of Rows Loaded
Total Load Time (min:sec)
0
1
10,639,872
10,729,480
10:53
12:00
2
10,726,954
12:18
3
8,147,163
08:04
DB2 config:
Processor Usage
Fastparse
UTILHEAP
51200
CPU & Disk
Parallelism=4
Dedicated
temporary
Processor Average % Processor Time
0
1
2
3
Average across all
96.202
96.444
96.354
96.458
96.365
DB2 EEE Load – 5GB Table
Pilot DISK USE
Disk Usage
Logical Disk
Disk Transfers/Sec
Average Disk Bytes/Transfer
F
G
H
I
74.901
87.604
96.937
93.490
352.932
52,953
47,312
48,796
47,703
196,764
All
Logical Disk
F
G
H
I
Average Disk Queue Length Average % Disk Time
0.634
0.735
0.838
0.825
0.758
Average across all
Logical Disk
Average Disk Bytes/Sec
F
G
H
I
All
4,107,972
4,187,794
4,740,160
4,459,905
17,495,831
63.387
70.161
81.005
78.552
73.28
Methods of Scaling
Presenter:
Sonia Ang
Data Management
IBM
20
Manual Split Nodes: Original
Logical Architecture
DB2 Node Split
Configuration
Files
Nodes
No
Network
Traffic
Re-map Physical Tablespace
Containers
Manual Split Node Strategy: avoiding
Network Burden (no Redistribute)
DB2 Node Split
Copy of configuration files
Container physically moved or
remapped via SAN to new server
DB2 EEE Database Structure
path
$DB2INSTANCE
NODEnnn n
DATABASE
PARTITION 1
DATABASE
PARTITION n
<DB2 Install
drive >
<DB2 Install
drive >
db2mpp
db2mpp
NODE0000
NODE000N
SQL00001
SQL00002
Database
Objects
Database
Objects
sgldbdir
Second
Database
SQL00001
SQL00002
sgldbdir
Database
Objects
23
Manual Split Node: Steps
DB2 Node Split
Node 1
0
1
G
H
2
3
I
F
Scotty
Step 1
0
1
G
H
2
I
Scotty
Step 2
3
F
Obrien
0
1
G
H
Scotty
Riker
2
I
Obrien
3
F
Wesley
Backup Recovery Method of Scaling
Backups of nodes copied to new target servers
across network
New logical Nodes created
New containers assigned to nodes
Backup Recovery Method of Scaling
Database restore to each new target node
Nodes distributed to additional servers
Containers new tablespaces
Pilot LOAD TIMES to 4 physical
Servers DB2 EEE – 5GB Table
Load Times Per Logical Node
Node
Number of Rows Loaded
Total Load Time (min:sec)
0
1
10,639,872
10,729,480
5:50
2
10,726,954
6.29
3
8,147,163
6.04
6.41
Manual Split Nodes in lieu of
Redistribute


PROS:
– Limited network impacts - Only Logs and
configuration files are migrated
– Scale out to 4 physical servers in 1.5 hours
– Best suited for Databases on SANs
– High Availability/ Fail over support
– Node/database independence
CONS:
– IBM does not support officially
– You must be VERY careful
NOTE: IBM recommends different methods.
16 Node ML Joint Venture Cluster
Presenter:
Sonia Ang
Data Management
IBM
29
4
Way Cyberbricks
WEB
Servers, Transaction
Servers & Database Servers
connected via Giganet CLAN
2TB
Fiber Channel
Storage Area Network
Intel Developer’s Conference 9/1/1999 DB2 EEE 5.2 fp11
Logical Design (Front End)

Browser Neutral DHTML

ASPs control the access
using VB Script

WEB Server Components
Generate DHTML from the
XML using style sheets

VB/ADO Components under
COM+ handle the business
logic and generate XML

SQL in the VB modules is
sent to the DBMS via OLEDB

Thread pooling via MTS
Logical Connectivity Default Node
Web Server 1
Database Node 0
Web Server 2
Database Node 1
Web Server 3
Database Node 2
Web Server 4
Database Node 3
Web Requestor 1
WLBS
Web Requestor 2
Logical Connectivity to database default node (Node 0)
32
Logical Connectivity Split MTS
Default Database Node
DB2 fp11
Web Server 1
Application Server 1
Database Node 0
Web Server 2
Application Server 2
Database Node 1
Web Server 3
Application Server 3
Database Node 2
Web Server 4
Application Server 4
Database Node 3
Web Requestor 1
WLBS
Web Requestor 2
Achieved 283 TPS @
100 users
VI off
VI on
33
Logical Connectivity Split MTS Targeted Node
Web Server 1
Application Server 1
Database Node 0
Web Server 2
Application Server 2
Database Node 1
Web Server 3
Application Server 3
Database Node 2
Web Server 4
Application Server 4
Database Node 3
Web Requestor 1
WLBS
Web Requestor 2
300 TPS
Node Intelligent APIs:
sqlugtpi (get partition);
sqlugprn (get partition node
& number)
34
16 node cluster OLTP
Performance

320 TPS Windows NT4
 400 Concurrent Users
 Average Response time 400ms


342 TPS improvement
 Using VB Script casting


DB2 5.2 FP13
MTS to COM+ port
Load time testing
 Parallel Concurrent Loading

45MM rows (positions) in 6 minutes
16 node vs. Pilot cluster

Using 8 port Brocade switches instead of 16
port; but more of them

Using a larger Giganet SAN fabric with failover

SAN support of several Terabytes

DB2 UDB Extended Enterprise v5.2 fixpack 11

Performance tuning ALL tiers!
Performance Insights

Queries include partitioning key for targeted node

DB2 v5.2 fp11

Data and Temp on SAN

LOGs on separate controller RAID 10

NT temp away from LOGs

Memory configuration parameters

Transactional configuration parameters
37
Memory Related Config Parms
Config
value
Justification
NUMDB
2
Freed up a lot of memory (internal
calculations)
Buffpage
64000
Dedicate memory to buffers.
Bufferpool Npages = -1
Sortheap
2000
sheapthresh
200,000
Reducing sortheap size allowed
us to reduce sheapthresh which
freed up memory for connections
maxagents
800
Maxappls
400
Allocates memory for max of maxappls
connections. Increasing maxagents
allowed 60% more connections.
Num_pool_agents
800
Kept in sync w maxagents
Num_init_agents
Not effective v5.2
Avg_appls
160
Optimizer uses this parameter to tune
its connection related params. Avails
buffer pool to query plans at run time.
Fcm_num_rqb
4096
Resolved request buffer errors
38
Transaction Performance
Related
Config
value
Justification
Intra-parallel
Off
The good of the many is more important
than the good of a few. 16 node: 300%
more DB connects 0 errors (*dft_degree
not used since off)
(on for DB
Maintenance)
Prefetch on
tables
Extentsize
256
32
Speeds performance when optimizer
determines process is sequential
(tablespace scans, sorts). S/b specified as
multiple of extentsize. Match extentsize
with Disk IO Block!
ENHANCED PERFORMANCE OF 1ST TIME
RUNS
ACTIVATE
DATABASE
logfilsize
4096
Reduced I/O on db2diag loggings.
Dft_queryopt
2
Simple query optimization
39
Most Effective Component Tuning
Component
Tuning
Impact
ADO
Read only cursor
+25TPS
IIS
Buffering on, Stateless session
+4TPS
MTS
DB2 DSNs set to MTS (connectn
Pooling) & create instance
+8TPS
TLS
Thread local storage object from +10TPS
Microsoft (application is
instantiated at the web & threads
are kept alive avoiding DB
reconnects)
ADO & DB2
Isolation level
adXactReadUncommitted aka
DB2 Uncommitted Reads. Turn
off commits
+5TPS
40
Reuse of technology to resolve
Production Explosive Growth
Presenter:
Christi Fortier
VP Merrill Lynch
41
CAR℗
Compliance Archive Retrieval System
Overview:

Component Based Infrastructure for archiving and
retrieving objects

Compliance with SEC and ML regulatory and legal
requirements

Stores and maintains objects created or received for 7
years on Write Once/Read Many (WORM) media

Database holds meta data and index into archived objects

Insert intensive initially 80/20%, now 60/40%.

MEMORY INTENSIVE in EE... spread across multiple physical
devices for optimal performance

Explosive growth

1st pilot 12/1997 was 1st UDB EE beta
CAR ℗ is patent pending product of Merrill Lynch
42
CAR℗ Cluster
Architectural Diagram
Database Architecture
 DB2UDB EEE v7.1 FP2
 DBMS & Logs on D(root
limits 4G)
 VI turned on between
database nodes
 1 catalog node on server 1,
6 data nodes spread on
servers 2,3,4 for fast
scaling out.
 Data, Index and Temp on
SAN
 Each node has 50G usable
RAID 5.
CAR℗ is patent pending product of Merrill Lynch
CAR℗ Architecture Specifics
Database Servers
Dell PowerEdge 6450
700 MHZ P3 Xeon 4CPU, 2MB Cache
2GB RAM on each
server
4 DB Servers scaled
out from 2 DB Servers
with room for more
Local(D drive):
Perc Drivers RAID
5,four 18 G drives
(50GB usable)
SAN -----------------------
2 LUNS per server
RAID 5,four 18 G
drives (50GB usable)
Hot swappable 2
Fibre throughout
Active/active
Diagrams by Albert Bauer ML
CAR℗ Explosive Growth
C A R ( T M ) Exp o nent ial Gr o wt h
Mailboxes sent to the
5, 000. 00
4, 500. 00
system impact growth
of database
exponentially
4, 000. 00
3, 500. 00
3, 000. 00
GB
2, 500. 00
2, 000. 00
Response time
1, 500. 00
1, 000. 00
demands are impacted
by type of mailboxes
opened into the
system
500. 00
0. 00
3 mai l boxes
CURRE NT
open mai l boxes
al l mai l boxes
Rubi con
Peak TPS
 Database clusters
1 Mailbox
1 Mailbox
3
mailboxes
open
mailboxes
Peak TPS
Rubicon
1500
1000
500
0
will allow us to scale
out with this growth
CAR ℗ is patent pending product of Merrill Lynch
Server Scaling
Cluster Architectures support fast, low risk, low cost
scaling opportunities

2 database Servers  4 (current)  7

1 Exchange Server  2 (current)  ?

1 Web Server -> 2 (current)  ?
SANs support HA & fast Disk Expansion

Need disk just plug fibre channel cables into SAN
switch, adjust via GUI and map to server

SAN switches daisy chain with nominal latency

Backup disk through the SAN without hitting
your server

Disk Array ported to 2 Switches for remote switch
failover
46
Complexities of CAR ℗

BOM model

Binary fields abound

Triggers keep the timezone differences in
sync on insert

Referential Integrity between huge tables
Query Complexities
TblID TOTL_GB ROWS
RI
t1
1.31
265,260 n
t2
11.72
4,458,373 c,t
t3
0.01
4c
t4
2.35
6,426,220 n
t5
1.31
502,737 c,f(t1)
t6
1.31
30,508 n
t7
15.63 117,973,538 f(T2,T3)
t8
0.17
185 c
t9
0.17
202 n
t10
0.17
2,528 f(t2)
t11
0.17
63 n
t12
0.01
12 n
t13
0.17
540 f(t11,t12)
t14
0.17
57 n
t15
0.03
0n
t16
0.11
0n
t17
0.01
0n
t18
0.01
0n
t19
0.01
0n
t20
0.01
0n
CAR ℗ is a patent pending product of Merrill Lynch
48
CAR℗ migration to EEE and
Binary issues

diverse binary data disallows delimited ASCII loads

IXF format is not allowed in EEE autoload

Positional loads a problem (varchars, nulls, wide
rows)

Staging loads is the only feasible way. Limitations:
 Referential Integrity must be turned off
– Performance issue
– Check pending & backup issue
 Not logged initially
 Unit of Work at script level even if SQL
completes. DB2 rolls everything back if postscript errors.
CAR℗ Database Migration
DB2v5.2fp11 EE to DB2v7.1fp2 EEE
Phase 1
- Copy Online Backup
and Logs to local disk
- Install DB2
production release (5.2
EE) on 1st server
- Restore redirect to
1st server (adjust
containers)
- Install 7.1 EEE on
server1
- Adjust the
DB2Instance,
SVCENAME
CAR℗ is patent pending product of Merrill Lynch
CAR℗ Database Migration
DB2v5.2fp11 EE to DB2v7.1fp2 EEE
Phase2
 Install 7.1 EEE on
other servers (services
in sync); DB MIGRATE
 Server1, node0:
Db2start add node
without tablespaces
with Computername
parm!!!! (db2ncrt will be
fixed in DB2 V8.1)
 Alter temporary
tablespace adding
containers to other
nodes (Must do to get
connectivity)
 Adjust configuration
CARⓅ is a patent pending product of Merrill Lynch
across all nodes
Performance Nuances


DB2 7.1 fp2

larger buffer page sizes

OLEDB
16 node tuning reused in Base Tuning

Queries include partitioning key for targeted node

Data and Temp on SAN

NT temp away from LOGs

CARⓅ involves OLCP transactions with a lot less connections

LOGs on separate controller RAID 5 (we need space & HA)


Memory configuration parameters
Transactional configuration parameters
Memory Related Config Parms
NEW Config
<
<
<
>
value
Justification
NUMDB
2
Freed up a lot of memory (internal
calculations)
Buffpage
35000
Table > 2GB Bufferpool
Pagesize
16K
Dedicate memory to buffers.
Bufferpool Npages = -1.
Sortheap
256 ( Crix, reorg)
sheapthresh
51200
maxagents
250 or 100
Maxappls
200 or 100
Num_init_Agents
25
Avg_appls
30
Reducing sortheap size allowed
us to reduce sheapthresh which
freed up memory for connections
Allocates memory for max of maxappls
connections. For this app, we
Connectivity of large amounts of users is
not as critical.
Optimizer uses this parameter to tune
its connection related params. Avails
buffer pool to query plans at run time.
**
Locklist
1600
Amount of space for lock list.
Insert intensive database.
53
Transaction Performance
Related
NEW
Config
value
Justification
Intra-parallel
Off
The good of the many is more important
than the good of a few. 16 node: 300%
more DB connects 0 errors (*dft_degree
not used since off)
(on for DB
Maintenance
)
**
Tempspace
DMS
12 -> 21GB
Avoids physical IO, reserves space
*
pkgcache
1600
amount of memory to be used for caching
packages and dynamic SQL statements
*
Catalogcache_
sz
32 or 256
internal catalog cache (from dbheap) used by
the SQL compiler to hold the packed
descriptors for commonly referenced objects
such as tables and constraints
*
Dbheap
6000
Useful for open cursors
54
Transaction Performance
Related (2 of 2)
NE
W
**
Config
value
Justification
Prefetch on tables
256
Extentsize
256/n
Speeds performance when optimizer
determines process is sequential (tablespace
scans, sorts). S/b specified as multiple of
extentsize.
Num_ioservers
50
Number of disks tablespace is USING
(Array Disks usable * nodes)
Num_iocleaners
30
Num CPUs + 2
Appl_heap_sz
128
Global work area for each agent
Num_pool_agents
250
Kept in sync w maxagents
55
Complex Query Tuning
When dealing with VLDBs, scaling out is not always the
only solution. It is critical to consider all facets of tuning:
 Ensure all predicates are covered in indexes
 Split up complex statements (i.e.: CASE)
Width of index impacts its use
 the order of the 2 largest tables may not have an
impact
Run
DB Maintenance regularly (Runstats, reorgs ..)
56
CAR℗ Comparative Timings
Transaction
Non-clustered
Clustered
Delta
Count 9GB
table (wide)
4min 21 secs
1min 51secs
3min
20sec
Count 12GB
tables (narrow)
34min 7 secs
2min 1 sec
32min 6
sec
Runstats on 12
GB table
37mins 54 secs
6 mins 44
secs
31 mins 10
secs
CAR℗ is a patent pending product of Merrill Lynch
57
CAR℗ Comparative Timings 2of2
Transaction
T7
Rows NonCardin retur clustered
ality
ned
Clustered
Delta
Complex
Queries 2 Days
of data
5191
3352
25 secs
6 secs
19 secs
Complex Query
5 days of data
11452
7306
31 secs
11 secs
20 secs
Complex Query
10 days of data
26641
15834 79 secs
secs
25 secs
54 secs
CAR ℗ is a patent pending product of Merrill Lynch
58
What did we learn?

Feasible to build high performance, highly available
web farms using relatively cheap components

Component Based Architectures drive down IT costs
even more with time

Clusters, Share nothing database architectures &
stateless components provide scalability & fail over
where your business needs it most

High Performance is achievable with easy, flexible
development tools (VB, Visual Interdev, …)

Interoperability exists in a multi-vendor environment

DB2 was the first DBMS to get us there
Questions
and
Answers
Session: A4, May 15 8:30am –10am
Scaling DB2 / NT Databases
For Explosive Growth
Contact Information
Christi Fortier VP Merrill Lynch
Phone: 609-274-1171
Email: [email protected]
FAX: 609-274-0206
Sonia Ang, IBM Certified Senior
Consulting IT Specialist – Data
Management
Tel: 732-926-2539, Fax: 732-9262455,
Pager: 1-800-946-4646 Pin 7127243
Cell : 908-403-0290,
Email: [email protected]
Content protected under Copyright 2001. CAR ℗ is patent pending product of Merrill Lynch
61