forcing updates one row at a time.

Download Report

Transcript forcing updates one row at a time.

Communicating with
the Outside
Overview
Package several SQL statements
within one call to the database server
• Embedded procedural language
(Transact SQL) with control flow
facilities.
• Use User Defined Functions (UDFs) when
they select out a high number of records.
• Do not use positioned updates, which update
rows that are obtained as the result of a
query, forcing updates one row at a time.
Packing sqls into one call to db
server
create table Temp1 (parent varchar(200))
create table Temp2(parent varchar(200))
create table Ancestor (person varchar(200))
/∗ Temp2will hold the latest generation discovered. ∗/
INSERT INTO Temp1
SELECT parent
FROM Parental
WHERE child = ’Nicholas Bennet’;
WHILE EXISTS(SELECT * FROM Temp1)
BEGIN
INSERT INTO Ancestor
SELECT * FROM Temp1;
INSERT INTO Temp2
SELECT * FROM Temp1;
DELETE FROM Temp1;
INSERT INTO Temp1
SELECT Parental.parent
FROM Parental, Temp2
WHERE Parental.child = Temp2.parent;
DELETE FROM Temp2;
IF EXISTS (
SELECT *
FROM Ancestor
WHERE person = ’Carol Diane’
)
PRINT ’Carol Diane is an ancestor of Nicholas Bennet.’
ELSE
PRINT ’Carol Diane is not an ancestor of Nicholas
Bennet.’
T-SQL to determines whether
Carol Diane is an ancestor
ofNicholas Bennet. Assume
a genealogical database
containing at least the relation
Parental(parent, child).
User Defined Functions
Throughput (queries/msec)
6
UDF
5
application
function
4
3
2
1
0
20%
80%
• Function computes the
number of working days
between two dates.
• Function executed either
on the database site
(UDF) or on the
application site
• Applying the UDF yields
good performances when
it helps reduce
significantly the amount
of data sent back to the
application.
Throughput (queries/msec)
Retrieve Needed Columns Only
• Avoid transferring
unnecessary data
• Might prevent the use of
a covering index, i.e.
index-only scan
• In the experiment the
subset contains ¼ of the
attributes.
1.75
1.5
1.25
all
covered subset
1
0.75
0.5
0.25
0
no index
index
– Reducing the amount of
data that crosses the
application interface yields
significant performance
improvement.
Retrieve Needed Rows Only
• If the user is only viewing a small subset of a very
large result set, it is best to
– Only transfer that subset
– Only compute that subset
• In cases when users only want a ‘feel’ of the data,
use TOP or FETCH FIRST to fetch just a few rows
• Applications that allow the formulation of ad-hoc
queries(An Ad Hoc Query is a query that cannot be
determined prior to the moment the query is issued I )
should permit users to cancel them.
Throughput (queries/sec)
Minimize the Number of
Query Compilations
• Prepared execution yields
better performance when
the query is executed
more than once:
0.6
0.4
– No compilation
– No access to catalog.
0.2
direct
prepared
0
0
1
2
3
4
5
6
Experiment performed on
Oracle8iEE on Windows 2000.
• Prepared execution plans
become obsolete if
indexes are added or the
size of the relation
changes.
Direct: ODBC direct execution
Prepared: ODBC prepared command, then execute it repeatedly .
Tuning the Application Interface
• Avoid user interaction within a transaction
• Minimize the number of roundtrips
between the application and the database
• Retrieve needed columns only
• Retrieve needed rows only
• Minimize the number of query
compilations
Bulk Loading Data
• Purpose: loading large volumes of data into a
database
• Tools
– SQL server: bcp and a Transact-SQL command BULK
INSERT.
– Oracle: SQL ∗ Loader
– DB2: Load utility
• Tool parameters:
–
–
–
–
–
Bypass query engine
Avoid logging
No index update
No constraint check
Frequency of commits
Use Direct Path for Bulk
Loading
sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl
load data
infile "lineitem.tbl"
into table LINEITEM append
fields terminated by '|'
(
L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER,
L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX,
L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYYMM-DD", L_COMMITDATE DATE "YYYY-MM-DD",
L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT,
L_SHIPMODE, L_COMMENT
)
Direct Path
Throughput (rec/sec)
50000
40000
30000
20000
10000
65
0
conventional
direct path
insert
Experiment performed on
Oracle8iEE on Windows 2000.
• Loading 600000 records
into the lineitem relation
from TPCH
• Direct path loading
bypasses the query
engine and the storage
manager. It is orders of
magnitude faster than
conventional path (with a
commit every 100 records)
and inserts (with a
commit for each record).
Batch Size
Throughput (records/sec)
5000
4000
3000
2000
1000
0
0
100000
200000
300000 400000
500000
Experiment performed on
SQL Server 2000
on Windows 2000.
600000
• Bulk load of 600000
records.
• Throughput increases
steadily when the batch
size increases to 100000
records.Throughput
remains constant
afterwards.
• Trade-off between
performance and amount
of data that has to be
reloaded in case of
problem.
• Bulk load of 600000
records.
• As expected:
8000
6000
4000
Experiment performed on
IBM DB2 UDB V7.1
on Windows 2000.
incremental
index
maintenance
statistics
0
non
recoverable
2000
recoverable no statistics rebuild index
THroughput (rec/sec)
Storage Engine Parameters
– Turning off logging
helps.
– Collecting statistics
hurts
– Maintaining indexes
incrementally hurts a
lot.
Connecting to
Multiple Databases
• Shared connection to reduce start-up cost
– Connection pooling
• Pass-through statements when
performance is CPU bound
– E.g. dblink in Oracle
– Eliminates query rewriting, translation and
postprocesing to adapt to specific SQL dialect
• Transfer large block of data to minimize
roundtrips when performance is network
bound,
summary