Transcript Slide 1

A Fast FPGA Implementation
of a Unique Multi-level Tree
based Image Classifier
Geoffrey Wall, Faizal Iqbal, Xiuwen Liu, and
Simon Foo
Center for Applied Vision and Imaging Science
Machine Intelligence Lab
Florida State University
7/16/2015
MAPLD 2005
1
Outline
•
•
•
•
•
•
•
•
•
Need for fast hardware data classification
Current hardware based classifiers
Our Classifier – The Classification Tree
Training
Classification
Results
Conclusions
Future Work
References
7/16/2015
MAPLD 2005
2
Need for fast hardware data classification
• Law Enforcement / Security
– Real-time face recognition
• Military
– Rapid Target Detection and Recognition
• Intelligence / Security
– Signal Intelligence, Voice Recognition, Data Mining
• Aerospace
– Satellite based signal processing and data
classification
• Consumer Electronics
– Personal data assistant - pattern intelligence, data
mining
7/16/2015
MAPLD 2005
3
Need for fast hardware data classification
• Current Pattern Recognition algorithms
– Floating point arithmetic
– Serial instruction-based computation in μP
– Use of transcendental functions
• Neural Networks, SVM, Statistical Parametric,
Non-Parametric, Other kernel Based Methods
– Use exponential kernel functions - SLOW
7/16/2015
MAPLD 2005
4
Need for fast hardware data classification
• Need algorithm suited for current FPGA topologies
– Floating point arithmetic
• Slow, takes up many logic cells to implement
– Fixed point
• Fast, embedded DSP, MAC blocks,
– Transcendental Functions
• Slow – use of CORDIC* functions to calc exponential
– Need method that does not use FP arithmetic or transcendental
functions
*Meyer-Base, A. , Watzel, R. , Meyer-Base, U. , Foo, S. , “A parallel CORDIC architecture
dedicated to compute the Gaussian potential function in neural networks,”
Elsevier Engineering Applications of Artificial Intelligence, vol. 16 (2003) pp.595-605
7/16/2015
MAPLD 2005
5
Need for fast hardware data classification
• Possible solution
– Nearest Neighbor
• Classifies an input pattern using Euclidean distance metric to
determine nearest neighbors whose class is known a priori
Unknown Point
Labeled (known)
Neighbor Points
– Problem
• Must calculate Euclidian distances to every known training point, then
sort the distances to find the smallest one
– Computationally expensive
7/16/2015
MAPLD 2005
6
Need for fast hardware data classification
Solution:
Calculate all the possible distances from
any given input pattern, to all labeled
training data ahead of time
– Challenges
• Choosing optimal features (metrics) for a given
problem
• Restrict the input pattern to a finite fixed-point
representation
• Transform the input pattern space in such a way
that classification results are accurate
• Build lookup table
7/16/2015
MAPLD 2005
7
The Classification Tree
• Chooses a optimized set of features based on the
training data
• Restricts the input pattern to a finite fixed point
representation
– linear shift, scaling, and quantization
• Transforms the input pattern space in such a way that
classification results are accurate
– Transformation Matrices
– Multilevel classifier
• Several stages, input space is further divided and
refined as we descend downward through the tree
7/16/2015
MAPLD 2005
8
Training Overview
• Input pattern dimension reduction
– Principal component analysis
– Stochastic gradient search
• Clustering classes in groups
– 2 or more classes – non-terminal nodes
– 1 class – leaf node
• Quantizing & Table Population
– Map feature space to indices
– Apply nearest neighbor rule to all possible
combination of indices – fill table
7/16/2015
MAPLD 2005
9
Training - Intro
We define a training set:
T  [A, B] [a1 , a 2 , a N , b1 , b 2 , , b M ] ,
With labels:
  1,2 ,,k  ai , bj  d
Now we wish to find a matrix:
W  lxd , l  d
Which maximizes the function:
N
F 
1
2  ( hc 1)
1

e
c1
Where:
aˆ k  Wa k
7/16/2015
hc 
max( (aˆm , aˆn )) Max inter-class dist
aˆn  Ab  b
min( (aˆi , aˆ j )) Min inter-class dist
i, j
aˆi , aˆ j , aˆ m  Ac ,  c
m, n
MAPLD 2005
10
Training - Gradient
•
•
Starting point in W search space
– Principal Component Analysis (Karhunen-Loève transform)
• linear transformation
– Reduces the dimensionality of a dataset
– Retains those features which contribute most to the variance
Stochastic gradient search
– Update matrix W using gradient search
F
F (W  W )  F (W )

Wi , j
Wi , j  Wi , j
Wi , j  Wi , j  
F
Wi , j
– If new W yields inferior results (lower F) go back to original W and add
some noise, try new W. Repeat until some tolerable threshold of error is
reached.
7/16/2015
MAPLD 2005
11
Training - finding super classes
• Have optimized W
– Reduce the pattern size of
all training patterns
A W , B W   WA, B,
A W   lxm , B W   lxn
ld
– Try classifying all patterns in B W using A W
as labeled nearest neighbor points
– If B W not classified perfectly, start building super classes
• Find
min  max D aˆi , aˆ j 
b,c  i, j

• Where aˆ i is labeled b , and aˆ j is labeled c
(choose to combine the two classes whose max distance is
the smallest)
7/16/2015
MAPLD 2005
12
Training – building super classes
• We now retry our classification
– If we still do not achieve perfect results
• build super classes until we achieve perfect results
Class 1
Super Class 1
Class 13
Class 9
Class 9
Class 7
Class 7
7/16/2015
MAPLD 2005
13
Training – building super classes
• Have classes grouped into super classes
– Almost ready to build LUT
Super Class 1
Class 5
Class 2
Class 7
Super Class 2
Class 1
Class 3
Super Class 1
Class 7
Class 4
Class 8
7/16/2015
MAPLD 2005
14
Training - Quantization
• To build LUT must
– Quantize the input from real values to integers
in some finite range
– Our LUT is multidimensional and takes on
dimensions = l, the parameter chosen for our
projection matrix, W
– Choose finite # of bins per dimension = b
• Gives us bl unique entries into our lookup table
– Now quantize the range of the input to be an
integer in range [0,b-1]
7/16/2015
MAPLD 2005
15
Training - Quantization
• Find min and max along each
dimension
– rangel = maxl – minl
• Quantize each dimension of input round (b  1) xl  minl 


range
by shift, dilation, and rounding
l 

– This maps real valued input space
to integer valued space
minl  0
maxl  (b  1)
• Can now build LUT
7/16/2015
MAPLD 2005
16
Training - Building LUT
• For all possible
values in the LUT
– Find the Euclidian
distance to every
quantized A W , B W
– The one with the
smallest distance is
the nearest neighbor
– Copy it’s label into that
entry in the table
7/16/2015
MAPLD 2005
628,1,1000
Super Class 1
629, 1,1000
Super Class 0
630,1,1000
Super Class 0
631,1,1000
Super Class 0
632,1,1000
Super Class 2
633,1,1000
Super Class 2
634,1,1000
Super Class 2
635,1,1000
Super Class 2
636,1,1000
Super Class 1
637,1,1000
Super Class 1
17
Training – Top level Node
• We now have top level node in our tree
– Each super class is a child node
Node 0
Super class 3
Super class 2
Super class 1
• Repeat the process of building the top node
– Have fewer class to handle at each node
– Each node has proj. matrix W, LUT
Node 0
Node 1
Node 4
Class 5
Class 9
7/16/2015
Node 3
Node 2
Node 5
Class 7
Class 2
Class 3
Class 8
Class 4
MAPLD 2005
Class 1
Class 6
Class 10
18
Classification Tree Example
Node 0
Node 3
Node 1
Node 2
Node 5
Node 4
Class 7
Class 5
Class 9
Class 2
Class 3
Class 8
Class 4
Class 1
Class 6
Class 10
Example Table with 6 nodes for classifying 10 classes
7/16/2015
MAPLD 2005
19
Classification Tree Example
Each node in the tree has associated with it several pieces of
information:
An m dimensional lookup table whose entries tell us which branch to
proceed down in the tree
n projection vectors (can be written as matrix A (n x m) )
Vectors B and M (1 x m each) used for quantization
At each new node we must take the input vector (m x 1) and project it
using A:
A * input = P
Where P (n x 1) is the projected result. We can then quantize P using
vectors B and M. The result of this quantization will yield an index into
the lookup table which will tell us which child node to proceed to. We
repeat this process until we reach a terminal node which yields a
classification result.
7/16/2015
MAPLD 2005
20
Classification Tree Example
Each node has a set of projection vectors, to project the input pattern
onto a smaller subspace
Suppose x =
[ 15
90 207
3
35
52
51 154
69
51 ] T
Now suppose we have transformation matrix A =
[ 4 238 216 171 174 128 78 174 38 219 ]
[ 190 119 134 214 97 181 48 77 178 218 ]
[ 113 107 52
5 212 109 49 138 96 151 ]
Then the resulting projection:
A * x = [ 124016 92453 73268 ]T
We then translate this projection to an index of a lookup table.
When we design our tree we must choose 2 parameters relating to the table. We choose
the number of dimensions in the table, as well as the number of discrete entries along each
dimension.
Suppose we choose there to be 3 dimensions and 10 discrete entries along each
dimension. This will yield a table containing 103 = 1000 distinct values.
7/16/2015
MAPLD 2005
21
Classification Tree Example
Associated with each row of projection matrix A are two parameters which are
computed ahead of time, these parameters allow us to translate the projected
input into indices in a lookup table. Let us call the two parameters B and M.
These values allow for quantization from the space Zm, to a finite set of integer
values.
We do this with the following formula:
Floor{ (Pk – Mk) / Bk } = qk
Where Pk is the kth component of the projected input vector, Mk and Bk are
quantization parameters associated with the kth row of matrix A, and qk is the kth
quantized result.
For Example: Suppose
b = [21134 35749 23412]T and m = [-1432 -6785 -9674]T
Then we have
q = [ 6 3 4 ]T
7/16/2015
MAPLD 2005
22
Classification Tree Example
We can now find the value stored in the look up table.
But what does this tell us? Below is an example lookup
table. The left column is the index. The right column
shows what is stored in that address.
628
1
629
0
630
0
631
0
632
2
633
2
634
2
635
2
636
1
637
1
7/16/2015
Node 2
Node 5
Class 3
Class 8
Suppose that the above lookup table belongs to Node 2 in the tree.
For this example we have chosen that each node can have up to 3
children. So we only see 3 distinct values in the table: {0, 1, 2}. Now
for index 634 (calculated on the last page), we see the number 2. A
value of 2 tells us to go down the rightmost branch to Node 5. A 0
entry yields Class 3, and an entry of 1 would yield class 8.
MAPLD 2005
23
Results Hardware
• Timing Diagram
– Modelsim simulation
Start Classification Signal
Done Signal
Which Child node to traverse next
7/16/2015
MAPLD 2005
24
Results - Software
• ORL face dataset
– 40 classes, 200 images
• 99.8 percent Correctly classified
– C simulation
7/16/2015
MAPLD 2005
25
Results – Overall
• 8 clock cycles per node in the tree
– Lets suppose an average depth of 3 layers
• About 24 clock cycles till we reach a terminal node
• This is a classification
• If we clock at 400 MHZ, can do about 16 million
classifications per second
7/16/2015
MAPLD 2005
26
Conclusions
• Speed, Performance, Accuracy
• Comparison to other hardware, software
classifiers
• Portability?
• Scalability?
7/16/2015
MAPLD 2005
27
Future Work
•
•
•
•
•
Real time video interface
Classify many objects in real time
Larger data sets
Better, Faster hardware
Real-time Face, object detection
7/16/2015
MAPLD 2005
28
References
[1] Liu, X and Wang, D L (2003) Texture classification using spectral histograms. IEEE Trans. on
Image Processing, 12 (6), p. 661-670
[2] Randen, T and Husøy, J H (1999) Filtering for Texture Classification: A Comparative Study. IEEE
Trans. Pattern Analysis and Machine Intelligence, (21) 4 p. 291 – 310
[3] Knittel, G (1996) A PCI-compatible FPGA-coprocessor for 2D/3D image processing, IEEE
Symposium on FPGAs for Custom Computing Machines Proceedings, (17-19), p.136 – 145
[4] Compton, K and Hauck, S (2002) Reconfigurable Computing: A Survey of Systems and Software.
ACM Computing Surveys, (34) 2, p. 171 – 210
[5] Iqbal, F (2004) Wavelet transform based image compression on FPGA. MS thesis. Florida State
University
[6] Meyer-Baese, U (2004) Digital Signal Processing with Field Programmable Gate Arrays. 2nd
Bk&CD-Rom edition, Springer-Verlag
[7] Avnet Design Services (2003) Xilinx Virtex-II Pro Development Kit, Avnet Design Services Inc
[8] Meyer-Base, A. , Watzel, R. , Meyer-Base, U. , Foo, S. , “A parallel CORDIC architecture dedicated
to compute the Gaussian potential function in neural networks,” Elsevier Engineering
Applications of Artificial Intelligence, vol. 16 (2003) pp.595-605
7/16/2015
MAPLD 2005
29