F3 CHREC All-Hands - University of Florida

Transcript F3 CHREC All-Hands - University of Florida

CHREC F3: Target Tracking
Rafael Garcia
11/26/08
F3 Goals, Motivations, & Challenges
Goals



Formulation
Analyze & examine available multi-FPGA
platforms and tools for scalable system design
Motivations

Meet performance requirements in HPC/HPEC
scenarios by mapping across multiple FPGAs

Exploit multi-FPGA platforms to develop larger,
complex designs and algorithms

Increase understanding of performance
prediction, power, and usability for scalable apps
F3
Insights
Challenges

Perform multilevel algorithm partitioning, analysis,
and optimization for multi-FPGA systems

Determine influence of application characteristics
on selection of platforms, tools and languages
2
Translation
Design

Develop applications & design strategies for
scalable architectures from case-study
Execution

Kalman Filter Overview

Traditional Kalman filters estimate the state of a dynamic
system in a noisy environment

Commonly used in target prediction and can be extended to
multiple dimensions, targets, and models

Excellent target tracker when an accurate model is known

Useful even if an accurate model is not known
Current Architecture

4 tightly coupled FPGAs mapped to 4 quadrants



System is driven by two global clocks
100MHZ inter-FPGA communication links
50MHz data-processing clock


Inter-FPGA communication occurs when target crosses a quadrant
boundary


2-step processing cycle returns results at 25MSa/s
Current state of target is passed along
Non-pipelined design

2-step cycle where one cycle depends on the previous one and the
other cycle depends on pseudo-sensor data from host CPU


Low frequency and lack of
pipeline registers is expected to
lower power consumption
2-cycle design simplifies
communication network
Current Architecture
Resource M4K rams DSPs ALUTs
Stratix II:
EP2S180F1020C3



1%
15%
2%
Continuously receiving pseudo-sensor data and
returning condensed information
Limited to a single target per quadrant
Set sensor sampling rate of 25MSa/s
Simplified Algorithm

Assumes steady-state
operation

Target must closely follow
given movement model for
accurate results


Model tracks four
parameters





Allows for precomputed
covariance and Kalman-gain
terms
RCML Representation
i=4
Start/
Initialize

Next-state
prediction
Time-Step
Advance
Update
error
covariance
Generate
Sensor
Readings
i=4
for each D value in MeasurementVector
Report
Current
Results
Horizontal position
Vertical position
Horizontal velocity
Vertical velocity
Remove the hardcoded
terms, increasing prediction
accuracy during nonsteady-state situations
Modify model to include Zaxis parameters for
airborne targets
Time-Update (“Predict”)
BCast
Measurement-Update (“Correct”)
Gather
Update
error
covariance
Correct
prediction
Data Set: PredictionVector
Element Type: fixed Position
fixed Acceleration
Num Elements:= 4
Algorithm Changes

for each C value in PredictionVector
Compute
Kalman
gain
Data Set: MeasurementVector
Element Type: fixed Position
fixed Acceleration
Num Elements:= 4
New Module Types
Sensor
Target
Precision
Resource
Kernel
Low Power
Fast Sampling
Slow
Fast
Fixed
Fixed
Low
Low
Kalman Filter
Kalman Filter
Airborne
Noisy
Multiple
Floating
Floating
Floating
High
Medium
High
MKS
Kalman Filter
Feature Selection
Multi-Scale
High-Noise
Selective
Kalman Filter

Estimates state of a dynamic system in a noisy
environment






In this case, the ‘dynamic system’ is a moving target
Commonly used in target prediction and can be extended
to multiple dimensions, targets, and models
Assumes sensor noise is white Gaussian noise
Requires a pre-programmed
model describing the target’s motion
Works in a continuous
2-cycle loop
Developed in 1960 by
Rudolf E. Kalman
(A UF professor from
1971-1992!)
Kalman System Models
Noisy
Samples
Kalman
Filter
Accurate
Samples

Mostly
Accurate
Samples
NE wind
-9.8 m/s
at as
23mph
Kalman Filter can be viewed
a simple black box
 An input stream of samples measuring a target’s position
is contaminated
with noisy samples
Follows
Road
 The output is a stream of samples with most of the noisy
samples filtered
Reasons for sensor noise

Battery Power

variable battery voltage


Sensors

low quality sensors


environmental conditions

rain, dust, night-time tracking, snow
Multiple targets

misinterpreted samples from neighboring targets during multiple-target
tracking


bad orientation, obstructed sensor
Environment


cost-cutting for mass production sometimes requires cheap sensors
incorrectly deployed sensors


voltage regulators cost money, draw power, and are not perfect
Sensor processing stage must ensure proper target isolation
Wireless signal

bad data from neighboring sensors due to a weak wireless signal
Kalman Filter example
PR Virtual Architecture with Kalman Filters


Sensor records samples
Image processing step extracts specific features



Target size, vertical position, horizontal position, target bearing, elevation, etc.
Kalman filters extract sensor noise
Results are sent to a central location to be displayed
VLX25
Communication architecture
Sensor
Interface
Switch 3
Switch
4
Switch 5
Module
interface
Module
interface
Module
interface
Module
interface
Kalman
Kalman
Kalman
Kalman
Kalman
filter
filter
filter
filter
filter
Switch
1
Switch
2
Module
interface
Display
Interface
FPGA and PR benefits for the Kalman Filter

FPGA amenable features

Low memory requirements

Simple filter with streaming inputs and outputs


Requires only multiplication and addition


No complex time-consuming operations such as division, square-root,
differentiation, etc.
Low bandwidth requirements


Can be implemented using only logic and MAC units
Filter receives/produces a stream of
coordinates, not a stream of images
PR amenable features

Optimum resource usage


The right filter type for the right job
Swapping modules does not halt
execution

Active filters are never disturbed
Experimental FPGA Power
Measurements
Experimental Setup

GiDEL Host Specifications







Dual Xeon 3.00 GHz processors (Pentium 4 era)
2GB RAM
Single 500GB hard drive
CD Drive
600W max power supply
(Kappa clone)
ProcStar II Power Characteristics

Main board supply rated at 7.6A at 3.3V

7.6A × 3.3V = 25.08W maximum power available to:






Stratix II EP2S180 FPGA (4x)
2GB SODIMM DDR memory(2x)(only 1 used for tests)
64MB SRAM memory (8x)
Miscellaneous oscillators, peripherals, controllers, etc.
This means roughly 5W max available to each FPGA
Test Design Characteristics

Kalman tracking filters


Heavy multiplier usage, no block rams, minimal logic usage (w/ dedicated multipliers)
In all cases, design runs at 33MHz
Methodology

GiDEL host system measured without FPGA board

P3 Kill-A-Watt AC power meter used for measurements

0.2% documented accuracy



Accurate to within 1 Watt
7 different test cases with varying power utilization
GiDEL host system measured with FPGA board

Same 7 test cases were used (without loading an FPGA design)


This provides minimum power-use baseline for ProcStar II
GiDEL board is loaded with FPGA-computationally intensive design


CPU is kept idle
Power consumption under regular design is measured (@ 33 MHz)




Power consumption under maximum-multiplier-use design is measured (@ 33 MHz)




2% logic use (per FPGA)
15% multiplier use (per FPGA)
1 filter instance per FPGA
4% logic use
88% multiplier use
7 filter instances per FPGA
Power consumption under maximum-logic-use design is measured (@ 33 MHz)



77% logic use
0% multiplier use
34 filter instances per FPGA
Without
ProcStar II
With ProcStar II
1. Server off (not
standby)
8W
8W
2. Idle
127 W
137 W
3. Idle with
CDROM spinning
131 W
141 W
4. Full HDD load
(defrag)
132 W
143 W
5. Full CPU load
(1 thread)
188 W
198 W
6. Full CPU load
(4 threads)
255 W
257 W
7. Full CPU/HDD
load (3 threads,
defrag)
258 W


264 W
Difference in Power (Watts)
Test Cases
Power Consumed (Watts)
Results: Baseline ProcStar II
GiDEL Server Power Consumption
300
200
100
0
1
2
3
4
5
6
7
Case Number
Without Procstar II
With Procstar II
GiDEL Server Power Consumption
(Difference)
15
10
5
0
1
2
3
4
5
Case Number
6
Threads are simple while(1) loops
Although only 2 cores are present, 4 threads were used to bypass Hyper-threading and
OS scheduling

HDD load is an exception since defrag requires its own thread to be effective
7

Results: Kalman Filters on ProcStar II
Power estimates



12.5% toggle rate assumed @ 33 MHz
Experimental numbers below assume FPGAs consume
all power (ie. ProcStar II memories, glue logic, etc.
consume 0W)
Design 1

140 W total power




15% mult., 2% logic
1 filter instance, high Fmax
Design 2

140 W total power




~3.25 W per FPGA
~3.25 W per FPGA
88% mult., 4% logic
7 filter instances, high Fmax
Design 3

152 W total power



~6.25 W per FPGA
0% mult., 77% logic
34 filter instances, low Fmax
Results: Kalman Filter in ProcStar II
Altera EP2S180 FPGA Power Comparison (single FPGA)
7
Power Consumption (Watts)
6
5
4
Early Estimator Spreadsheet
PowerPlay
3
Measured Power*
2
1
0
Design 1

Design 2
Design 3
*Measured power is derived by subtracting baseline power consumption on ProcStar II board
from measured power consumption and dividing by 4

Power consumed from board components not accounted for, actual FPGA power consumption is lower
Questions?

F3 CHREC All-Hands - University of Florida

Transcript F3 CHREC All-Hands - University of Florida

Directory