Transcript slides
Speeding up k-Means by GPUs
1
YOU LI
SUPERVISOR: DR. CHU XIAOWEN
CO-SUPERVISOR: PROF. LIU JIMING
THURSDAY, MARCH 11, 2010
Outline
2
Introduction
Efficiency of data mining -> GPGPU -> k-means on GPU;
Related work
Method
Research Plan
Efficiency of Data mining
3
Face the challenge of efficiency due to the increasing data
Fig.2
Fig.1
Parallel
data mining
GPGPU
5
A general-purpose and high performance parallel hardware;
Supply another platform for parallelizing data mining
algorithms.
Control
Tesla
C870
Cache
500
GeForce
8800 GTX
GFLOPS
400
ALU ALU
CPU
600
Fig.3
ALU ALU
DRAM
Quadro
FX 5600
300
G71
G70-512
G70
200
100
NV35
NV30
0
Jan 2003
NV40
3.0 GHz
Core 2 Duo
GPU
3.0 GHz
Core 2 Quad
3.0 GHz Pentium 4
Jul 2003
Jan 2004
Jul 2004
Jan 2005
Jul 2005
Jan 2006
Jul 2006
Jan 2007
Jul 2007
DRAM
k-means on GPU
6
Programming on GPU
CUDA: integrated CPU+GPU , C program
k-Means
Widely used in statistical data analysis, pattern recognition, etc.;
Easy to implement on CPU, suitable to implement on GPU;
Outline
7
Introduction
Related work
UV_k-Means, GPUMiner and HP_k-Means;
Method
Research Plan
Related work
8
Speed of k-Means on low dimension data, in second.
700
n
k
d
2
million
100
400
100
400
100
400
100
400
2
2
8
8
2
2
8
8
4
million
MineBech
on CPU
19.36
70.93
39.81
152.25
38.74
141.84
79.60
304.46
HP
k-Means
1.45
2.16
2.48
4.53
2.88
4.38
4.95
9.03
UV
k-Means
2.84
5.96
6.07
16.32
5.64
11.94
12.85
34.54
NVIDIA GTX 280 GPU; Intel(R) Core(TM) i5 CPU;
GPU
Miner
61.39
63.46
192.05
226.79
130.36
126.38
383.41
474.83
600
500
Series2
Series1
400
300
200
100
0
MineBech
HP
on CPU k-Means
UV
k-Means
GPU
Miner
Outline
9
Introduction
Related work
Method and Results
k-Means (three steps)-> step 1 -> step 2 -> step 3;
Experiments;
Research Plan
k-Means algorithm
10
n data point;
k centroid;
Compute
distanc (ni, ki)
find the closest
centroid
compute
new centroid
Yes
If centroid
change?
No
End
Step 1
O(nkd)
Step 2
O(nk)
Step 3
O(nd)
Memory
Mechanism
Memory Mechanism of GPU
11
Global Memory
Large size
Long latency
Register
Small size
Short latency
User cannot control
Shared memory
Medium size
Short latency
User control
k-Means on GPU
12
Key idea
Increase the number of computing operation for each global
memory access;
Adopts the method from matrix multiplication and reduction.
Dimension is a key parameter
For low dimension: use register;
For high dimension: use shared memory;
k-Means on GPU
13
For low dimension
Read each data from
global memory once
k-Means on GPU
14
For high dimension
Read each data from
global memory once
Experiments
15
The experiments were conducted on a PC with an NVIDIA
GTX280 GPU and an Intel(R) Core(TM) i5 CPU.
GTX 280 has 30 SIMD multi-processors, and each one contains
eight processors and performs at 1.29 GHz. The memory of the
GPU is 1GB with the peak bandwidth of 141.7 GB/sec.
The CPU has four cores running at 2.67 GHz. The main memory
is 8 GB with the peak bandwidth of 5.6 GB/sec. We use Visual
Studio 2008 to write and compile all the source code. The
version of CUDA is 2.3.
We calculate the time of the application after the file I/O, in
order to show the speedup effect more clearly.
Experiments
16
On low dimension data
Compare with HP, UV and GPUMiner, the data is generated
randomly
40
35
30
25
Series1
Series2
20
15
10
5
0
Our k-Means
HP k-Means
UV k-Means
Four to ten times faster than HP
Experiments
17
On high dimension data
Compare with UV and GPUMiner, the data is from KDD 1999.
45
40
35
30
Series1
Series2
25
20
15
10
5
0
Our k-Means
UV k-Means
GPU Miner
Four to eight times faster than UV
Experiments
18
Compare with CPU
The results illustrate that our algorithm compares
very favorably with other existing algorithms.
Forty to two hundred times faster than CPU version
Outline
19
Introduction
Related work
Method
Research Plan
Research Plan
20
Detail analysis about k-Means on GPU
GFLOPS
Deal with even larger data set
Other data mining algorithms on GPU
K-nn
SDP (widely used in protein identification )
Q&A
21
Thanks very much