September 2010 - NVidia GPU Technology Conference

Download Report

Transcript September 2010 - NVidia GPU Technology Conference

SQL vs GPU
Richard Wilton
Johns Hopkins University
Department of Physics and Astronomy
1
Rationale
• Terabyte-scale data is best managed in a DBMS
• Some classes of computation are best implemented
using a GPU
2
How to do GPU computation in a terabytescale database
• Do the computation “outside” the database (export the
data / compute / re-import the data)
• Do the computation “inside” the database
• Something in between
3
SQLCLR out-of-process server
The basic concept
• Implement computational functionality in a separate process from the SQL Server process
• Access that functionality using interprocess communication (IPC)
SQL Server
SQL code
SQLCLR procedure
or function
Out-of-process server
IPC
Special-case
functionality
Why?
• Avoid memory, threading, and permissions restrictions on SQLCLR implementations
• Load dynamic-link libraries
• Invoke native-code methods
• Exploit lower-level APIs (e.g. SqlClient, bulk insert) to facilitate data movement between SQL Server and non-SQL
computational resources
SQLCLR out-of-process server  implementation
SQL Server
SQL code
Out-of-process server
IPC
SQLCLR procedure
CUDA functionality
SQL
declare @sql nvarchar(max)
set @sql = N'exec SqA1.dbo.SWGPerf @qOffset=311, @chrNum=7, @dimTile=128'
exec dbo.ExecSWG @sqlCmd=@sql, @targetTable='##tmpX08'
SQLCLR implementation (in C#)
• Allocates IPC buffer
• Initializes IPC buffer with the result set from a specified SQL query
• Signals out-of-process server
• Waits for response from out-of-process server
• Returns data from the IPC buffer as a SQL result set
Out-of-process server implementation (in C#)
• Loads the CUDA implementation (dynamic link library)
• Invokes a method in the DLL with a reference to the IPC buffer
• Waits for completion
• Signals the SQLCLR implementation
• Stand-alone application or Windows service
CUDA implementation (in C++)
• Implements GPU functionality
• Uses the IPC buffer for both incoming and outgoing data
5
A test case: genomic sequence data
• Nucleic acid sequence data
• Data is processed in a “loose” workflow by a
sequence of software tools
• Example:
– Align short sequences to a long reference sequence
– Identify alignment regions of interest
– Map regions of interest to annotation data (genome
map, known variations, etc.)
6
A test case: genomic sequence data
• Typical data-management scenario:
– All data is maintained in file-system files
– Processing software reads and writes files
– “Database-like” operations (e.g. joins) are performed
using command-line tools (e.g. Perl scripts)
• Can we use a DBMS as a repository?
– All data is maintained in the DBMS
– Processing software reads and writes tables
– “Database-like” operations (e.g. joins) are performed
using SQL
7
A test case: genomic sequence data
• Nucleic acid sequence data
• DBMS: data repository
• GPU: implementation of sequence alignment algorithms
– Gapped alignment: Smith Waterman
– Non-gapped alignment: bit vector mapping through a hash table
8
A test case: genomic sequence data
• Results: so far, so good …
– Sequence data can be managed using a DBMS
• Use low-level (C#, C++) implementations to manipulate string and binary
data
– Sequence data from the DBMS can be processed using GPU
implementations
• GPU sequence alignment implementations can exploit task parallelism and
execute 10x (or more) faster than equivalent CPU implementations
• But …
– DBMS data abstractions (tables, columns) must be mapped to
and from native GPU data formats and layouts
– Data movement out of and into the DBMS is slow
9
Going forward…
• Compromise: maintain GPU-formatted data as BLOBs in
the file system
– BLOBs are in database “native” format
– BLOBs correspond to database tables or table partitions
• Much programming and testing remain to be done…
10