Slide - CSE, IIT Bombay

Download Report

Transcript Slide - CSE, IIT Bombay

Exploiting Asynchronous IO using
the Asynchronous Iterator Model
Suresh Iyengar *
S. Sudarshan
Santosh Kumar #
Raja Agrawal &
IIT Bombay
Current affiliations: * Microsoft Hyderabad,
#
Guruji.com,
& SAP
Agenda

AIO Background

Exploiting AIO in query processing
•
Asynchronous Iterator model

Asynchronous Index Nested Loops Join

Asynchronous versions of other operators

Performance results

Related Work

Conclusion
COMAD 2008 , IIT Bombay
IO Processing : Traditional way
Application
Kernel
System call
Read ()
Context switch
data
Initiate IO
Read
response
Application
Blocked !
 CPU is idle most of the time waiting for an IO completion
COMAD 2008 , IIT Bombay
IO Processing : Async. way
Application
Kernel
System call
Initiate IO
AIO Read ()
Notify
Do other
Read
response
data
work !!
COMAD 2008 , IIT Bombay
IO Processing : Async. Way
 Asynchronous approach
• Overlap of CPU and IO processing
• Application can generate multiple IO requests
 Allows IO subsystem to reorder access to data on disk
• Important in RAID environments
COMAD 2008 , IIT Bombay
Asynchronous IO Interface
Linux 2.6 kernel
( File descriptor, offset, buffer,
numBytes, … )
aio_read ( aio structure)
Request an AIO read operation
aio_error ( aio structure )
Check the status of an AIO request
lio_listio ( array of aio
structures )
Initiate a list of AIO operations
• We use list AIO in our implementation
• Can initiate multiple IO read operations in one system call
COMAD 2008 , IIT Bombay
Handling AIO completion
 Signal-based handler
• A signal is generated on IO completion
 Callback using interrupts
Call completion
handler
• An interrupt is generated on IO completion
 Concurrent access to completion handler and shared
data structures in both of above methods
 Polling
• Store IO requests in pending queue and poll periodically
for completion
• Our experiments show polling beats signal/interrupt
based approach
COMAD 2008 , IIT Bombay
Demand-Driven Iterator
• Open()
• Next()
NL
J
sca
n
• Close()
sca
n
Blocking
call !
Table A
Table B
 Bottom level nodes perform operations such as
sequential scans or index scans.
 Upper level nodes are join nodes or other operator
nodes such as sort or aggregate.
COMAD 2008 , IIT Bombay
Agenda

AIO Background

Exploiting AIO in query processing
•
Asynchronous Iterator model

Asynchronous Index Nested Loop (INL) Joins

Asynchronous versions of other operators

Performance results

Related Work

Conclusion
COMAD 2008 , IIT Bombay
Asynchronous Iterator
• Open()
• Next()
NL
J
sca
n
• Close()
sca
n
• I don’t have the tuple
available in the memory !!
Table A
Table B
• Issue AIO read operation
• Return “LATER”
Non- Blocking
call !
COMAD 2008 , IIT Bombay
Asynchronous Iterator Model (AIM)
 Allow a node to return a status “LATER” to the parent
• Instead of blocking for IO completion.
 The parent operator could
• Perform other work, such as fetching data from another
input
• Simply return a LATER status to its parent node
• Or just loop, reinvoking the child operator till it returns a
tuple
 E.g. root of the execution plan tree
 Exact action depends on operator
• Asynchronous versions of different operators
• Focus on Asynchronous Indexed Nested Loops join
COMAD 2008 , IIT Bombay
Asynchronous INL Joins
 Original state of Indexed Nested Loops (INL) node
• Left and right subplans and qualifier lists
 Augmented state for async INL node
• An array of outer tuples each having a queue of matching
inner TIDs
 AIO may have been issued for some already, others later
• A workqueue for outer slots which already have AIO issued
for their matching inner TIDS
• An IO queue recording all pending AIO requests made by
the node
 Used to poll for completion of AIO requests
COMAD 2008 , IIT Bombay
Asynchronous INL Join (contd.)
 We divide the async INL join operations into two stages
• Stage 1: Fetch outer tuples and issues AIO requests
• Stage 2: Check for AIO completion, process AIO results
and return join results.
 Stages are interleaved
• Stage 1 may be in progress for some tuples, and Stage 2
for others
COMAD 2008 , IIT Bombay
Asynchronous INL Join (contd.)
Stage 1
Fetch outer tuples
For each outer tuple
Find the matching inner
TIDs for each outer tuple
Put the outer tuple in
workqueue
Issue LIST AIO for matching inner
TIDS of all outer tuples in workqueue
(subject to BATCH_SIZE)
COMAD 2008 , IIT Bombay
Asynchronous INL Join (contd.)
 Rules
• Batch size
 BATCH_SIZE: max number of outstanding AIO requests
 Why? OS limits, efficiency issues
 We set the MAX_BATCH_SIZE per node to 200 in our
experiments
 Scale BATCH_SIZE in powers of 2 till MAX_BATCH_SIZE so
that async INL can output tuples quickly at the onset
• Case where outer tuple matches a large number of inner
tuples is handled appropriately
• Keeping the AIO queue filled
 We issue further AIO requests (fetching outer tuples as
required) if 10 % of earlier AIO requests have completed
COMAD 2008 , IIT Bombay
Asynchronous INL Join (contd.)
For each outer tuple in workqueue
Stage 2
Check if any matching inner
TIDs are present in memory
Present ?
No
Yes
• Remove that inner TID from outer
tuple’s TID array
• Perform join and add to result
• if join result found break from loop
Update workqueue
Next page ..
COMAD 2008 , IIT Bombay
Asynchronous INL Join (contd.)
Prev page..
Any join results?
Yes
Return result
to parent node
Back to start of
Stage 2
Yes
No
No
Is no outstanding outer tuples &
reached end of outer tuple
Yes
Poll for AIO completion
Is tuple found or parent node
cannot handle LATER
No
tupStat = END_OF_RESULT
result = NULL
tupstat = LATER
result = NULL
Return result and tupStat to parent node
COMAD 2008 , IIT Bombay
Async. versions of other operators
 Async Sequential scan
• Check if next tuple is in the in-memory buffer
• If its present, return the tuple
• Else initiate an async read. Set tupStat = LATER and return
 Out of order sequential scan
• Start returning the tuples of a particular relation which are
already there in the memory
 even if out of order
• Concurrently, issue AIO for other tuples
COMAD 2008 , IIT Bombay
Async. versions of other operators
I can start the sorting
of other input !
Merge
Join
sort
sort
Seq scan
Seq scan
T1
T2
Initiate
AIO read
COMAD 2008 , IIT Bombay
Performance Results
 Experiments with TPC-H database with scale factors of
1 and 10 in three different setups
• Core 2 duo P4 with:
 1GB RAM and TPC-H - 1 GB database (single disk)
 1GB RAM and TPC-H – 10 GB database (single disk)
 3.2GB RAM and TPC-H – 10 GB database (4 disks / RAID 10)
 We use PostgreSQL 8.1.3 as the code base
 Compare it with our modified version of the same code
base, incorporating asynchronous iterator model
• with async INL and async seq. scan
COMAD 2008 , IIT Bombay
Performance Results: 1GB RAM
Query 1a: select l_orderkey, l_quantity from orders, lineitem
where o_orderkey=l_orderkey and l_orderkey%100=2 and l_linestatus=’F’
TPCH
1 GB
TPCH
10 GB
COMAD 2008 , IIT Bombay
Performance Results: 1 GB RAM
Query 2a: select l_orderkey,l_quantity from orders,lineitem,customer
where o_orderkey=l_orderkey and o_custkey=c_custkey
and l_orderkey%100=2 and l_linestatus=’F’
TPCH
1 GB
TPCH
10 GB
COMAD 2008 , IIT Bombay
Performance Results : 1GB RAM
Query 2a : Join of orders, lineitem and customer with filter (TPCH 1GB )
Startup effect
COMAD 2008 , IIT Bombay
Performance Results: 1 GB RAM
Query 2b: select l_orderkey,l_quantity from myorders,lineitem,customer
where o_orderkey=l_orderkey and o_custkey=c_custkey
-- No tight selection
TPCH
1 GB
TPCH
10 GB
1GB RAM
COMAD 2008 , IIT Bombay
Performance Results: 3.2 GB + RAID
Query 1a : Join of orders and
lineitem with filter
Query 2a : Join of orders, lineitem
and customer with filter
TPC-H 10GB / 3.2GB RAM / 4 disks RAID10
COMAD 2008 , IIT Bombay
Performance Results: 3.2 GB + RAID
Query 1b : Join of myorders, lineitem
Query 2b : Join of myorders, lineitem
and customer
TPC-H 10GB / 3.2GB RAM / 4 disks RAID10
COMAD 2008 , IIT Bombay
Performance Results
TPC-H Q12:
select l_shipmode,sum(...)
from orders,lineitem
where o_orderkey = l_orderkey and <several selection>
group by l_shipmode order by l_shipmode
Original INL
Async INL
Gain
TPCH 1GB
1GB RAM
64.7 sec
48 sec
25 %
TPCH 10 GB
1GB RAM
687 sec
431 sec
37 %
TPCD 10GB
RAID 10
4 disks, 3.2 GB
RAM
164 sec
147 sec
10 %
COMAD 2008 , IIT Bombay
Related Work
 Graefe’s generalized spool iterator (Graefe [ BTW03 ])
• Pre-fetches multiple
outer tuples
INL
• Issue AIO for matching
inner TIDS
Spool operator
• Can be replenished
when empty or when
one tuple is joined
scan
Index
lookup
COMAD 2008 , IIT Bombay
Related Work
 AIO used in database products
• Microsoft SQL Server, IBM DB2, Oracle
• No public documentation on how these systems use AIO
 Asynchronous iteration for evaluating web queries
(R.Goldman and J. Widom [ SIGMOD 2000 ] )
• They report results only on web queries
COMAD 2008 , IIT Bombay
Conclusion
 Proposed the Asynchronous Iterator Model (AIM)
 Presented asynchronous versions of INL and some
operators
 Showed gains of over 50 % in some cases
 AIM can be useful in web-service access and in data
integration systems like IBM DataJoiner
 Future work
• Implementing async versions for index lookup, sub plan,
sort and merge operator
• Performing async IO in the presence of ordering
constraints
COMAD 2008 , IIT Bombay
Thank You
Questions ?
COMAD 2008 , IIT Bombay
Plans
 Query 1a :
• Seq scan on lineitem, probe on orders
• Merge Join
-> Index Scan on orders
-> Sort lineitem
-> Seq Scan on lineitem
 Query 2a:
• Nested Loop
-> Nested Loop
-> Seq Scan on lineitem
-> Index Scan on orders
-> Index Scan on customer
COMAD 2008 , IIT Bombay
Plans
 Query 2a
Merge Join
-> Sort orders
-> Merge Join
-> Index Scan on orders
-> Sort on lineitem
-> Seq Scan on lineitem
-> Index Scan on customer
 Query 2b :
Nested Loop
-> Nested Loop
-> Seq Scan on lineitem
-> Index Scan on myorders
-> Index Scan on customer
COMAD 2008 , IIT Bombay