F3 CHREC All-Hands - University of Florida
Download
Report
Transcript F3 CHREC All-Hands - University of Florida
CHREC F3: Target Tracking
Rafael Garcia
11/26/08
F3 Goals, Motivations, & Challenges
Goals
Formulation
Analyze & examine available multi-FPGA
platforms and tools for scalable system design
Motivations
Meet performance requirements in HPC/HPEC
scenarios by mapping across multiple FPGAs
Exploit multi-FPGA platforms to develop larger,
complex designs and algorithms
Increase understanding of performance
prediction, power, and usability for scalable apps
F3
Insights
Challenges
Perform multilevel algorithm partitioning, analysis,
and optimization for multi-FPGA systems
Determine influence of application characteristics
on selection of platforms, tools and languages
2
Translation
Design
Develop applications & design strategies for
scalable architectures from case-study
Execution
Kalman Filter Overview
Traditional Kalman filters estimate the state of a dynamic
system in a noisy environment
Commonly used in target prediction and can be extended to
multiple dimensions, targets, and models
Excellent target tracker when an accurate model is known
Useful even if an accurate model is not known
Current Architecture
4 tightly coupled FPGAs mapped to 4 quadrants
System is driven by two global clocks
100MHZ inter-FPGA communication links
50MHz data-processing clock
Inter-FPGA communication occurs when target crosses a quadrant
boundary
2-step processing cycle returns results at 25MSa/s
Current state of target is passed along
Non-pipelined design
2-step cycle where one cycle depends on the previous one and the
other cycle depends on pseudo-sensor data from host CPU
Low frequency and lack of
pipeline registers is expected to
lower power consumption
2-cycle design simplifies
communication network
Current Architecture
Resource M4K rams DSPs ALUTs
Stratix II:
EP2S180F1020C3
1%
15%
2%
Continuously receiving pseudo-sensor data and
returning condensed information
Limited to a single target per quadrant
Set sensor sampling rate of 25MSa/s
Simplified Algorithm
Assumes steady-state
operation
Target must closely follow
given movement model for
accurate results
Model tracks four
parameters
Allows for precomputed
covariance and Kalman-gain
terms
RCML Representation
i=4
Start/
Initialize
Next-state
prediction
Time-Step
Advance
Update
error
covariance
Generate
Sensor
Readings
i=4
for each D value in MeasurementVector
Report
Current
Results
Horizontal position
Vertical position
Horizontal velocity
Vertical velocity
Remove the hardcoded
terms, increasing prediction
accuracy during nonsteady-state situations
Modify model to include Zaxis parameters for
airborne targets
Time-Update (“Predict”)
BCast
Measurement-Update (“Correct”)
Gather
Update
error
covariance
Correct
prediction
Data Set: PredictionVector
Element Type: fixed Position
fixed Acceleration
Num Elements:= 4
Algorithm Changes
for each C value in PredictionVector
Compute
Kalman
gain
Data Set: MeasurementVector
Element Type: fixed Position
fixed Acceleration
Num Elements:= 4
New Module Types
Sensor
Target
Precision
Resource
Kernel
Low Power
Fast Sampling
Slow
Fast
Fixed
Fixed
Low
Low
Kalman Filter
Kalman Filter
Airborne
Noisy
Multiple
Floating
Floating
Floating
High
Medium
High
MKS
Kalman Filter
Feature Selection
Multi-Scale
High-Noise
Selective
Kalman Filter
Estimates state of a dynamic system in a noisy
environment
In this case, the ‘dynamic system’ is a moving target
Commonly used in target prediction and can be extended
to multiple dimensions, targets, and models
Assumes sensor noise is white Gaussian noise
Requires a pre-programmed
model describing the target’s motion
Works in a continuous
2-cycle loop
Developed in 1960 by
Rudolf E. Kalman
(A UF professor from
1971-1992!)
Kalman System Models
Noisy
Samples
Kalman
Filter
Accurate
Samples
Mostly
Accurate
Samples
NE wind
-9.8 m/s
at as
23mph
Kalman Filter can be viewed
a simple black box
An input stream of samples measuring a target’s position
is contaminated
with noisy samples
Follows
Road
The output is a stream of samples with most of the noisy
samples filtered
Reasons for sensor noise
Battery Power
variable battery voltage
Sensors
low quality sensors
environmental conditions
rain, dust, night-time tracking, snow
Multiple targets
misinterpreted samples from neighboring targets during multiple-target
tracking
bad orientation, obstructed sensor
Environment
cost-cutting for mass production sometimes requires cheap sensors
incorrectly deployed sensors
voltage regulators cost money, draw power, and are not perfect
Sensor processing stage must ensure proper target isolation
Wireless signal
bad data from neighboring sensors due to a weak wireless signal
Kalman Filter example
PR Virtual Architecture with Kalman Filters
Sensor records samples
Image processing step extracts specific features
Target size, vertical position, horizontal position, target bearing, elevation, etc.
Kalman filters extract sensor noise
Results are sent to a central location to be displayed
VLX25
Communication architecture
Sensor
Interface
Switch 3
Switch
4
Switch 5
Module
interface
Module
interface
Module
interface
Module
interface
Kalman
Kalman
Kalman
Kalman
Kalman
filter
filter
filter
filter
filter
Switch
1
Switch
2
Module
interface
Display
Interface
FPGA and PR benefits for the Kalman Filter
FPGA amenable features
Low memory requirements
Simple filter with streaming inputs and outputs
Requires only multiplication and addition
No complex time-consuming operations such as division, square-root,
differentiation, etc.
Low bandwidth requirements
Can be implemented using only logic and MAC units
Filter receives/produces a stream of
coordinates, not a stream of images
PR amenable features
Optimum resource usage
The right filter type for the right job
Swapping modules does not halt
execution
Active filters are never disturbed
Experimental FPGA Power
Measurements
Experimental Setup
GiDEL Host Specifications
Dual Xeon 3.00 GHz processors (Pentium 4 era)
2GB RAM
Single 500GB hard drive
CD Drive
600W max power supply
(Kappa clone)
ProcStar II Power Characteristics
Main board supply rated at 7.6A at 3.3V
7.6A × 3.3V = 25.08W maximum power available to:
Stratix II EP2S180 FPGA (4x)
2GB SODIMM DDR memory(2x)(only 1 used for tests)
64MB SRAM memory (8x)
Miscellaneous oscillators, peripherals, controllers, etc.
This means roughly 5W max available to each FPGA
Test Design Characteristics
Kalman tracking filters
Heavy multiplier usage, no block rams, minimal logic usage (w/ dedicated multipliers)
In all cases, design runs at 33MHz
Methodology
GiDEL host system measured without FPGA board
P3 Kill-A-Watt AC power meter used for measurements
0.2% documented accuracy
Accurate to within 1 Watt
7 different test cases with varying power utilization
GiDEL host system measured with FPGA board
Same 7 test cases were used (without loading an FPGA design)
This provides minimum power-use baseline for ProcStar II
GiDEL board is loaded with FPGA-computationally intensive design
CPU is kept idle
Power consumption under regular design is measured (@ 33 MHz)
Power consumption under maximum-multiplier-use design is measured (@ 33 MHz)
2% logic use (per FPGA)
15% multiplier use (per FPGA)
1 filter instance per FPGA
4% logic use
88% multiplier use
7 filter instances per FPGA
Power consumption under maximum-logic-use design is measured (@ 33 MHz)
77% logic use
0% multiplier use
34 filter instances per FPGA
Without
ProcStar II
With ProcStar II
1. Server off (not
standby)
8W
8W
2. Idle
127 W
137 W
3. Idle with
CDROM spinning
131 W
141 W
4. Full HDD load
(defrag)
132 W
143 W
5. Full CPU load
(1 thread)
188 W
198 W
6. Full CPU load
(4 threads)
255 W
257 W
7. Full CPU/HDD
load (3 threads,
defrag)
258 W
264 W
Difference in Power (Watts)
Test Cases
Power Consumed (Watts)
Results: Baseline ProcStar II
GiDEL Server Power Consumption
300
200
100
0
1
2
3
4
5
6
7
Case Number
Without Procstar II
With Procstar II
GiDEL Server Power Consumption
(Difference)
15
10
5
0
1
2
3
4
5
Case Number
6
Threads are simple while(1) loops
Although only 2 cores are present, 4 threads were used to bypass Hyper-threading and
OS scheduling
HDD load is an exception since defrag requires its own thread to be effective
7
Results: Kalman Filters on ProcStar II
Power estimates
12.5% toggle rate assumed @ 33 MHz
Experimental numbers below assume FPGAs consume
all power (ie. ProcStar II memories, glue logic, etc.
consume 0W)
Design 1
140 W total power
15% mult., 2% logic
1 filter instance, high Fmax
Design 2
140 W total power
~3.25 W per FPGA
~3.25 W per FPGA
88% mult., 4% logic
7 filter instances, high Fmax
Design 3
152 W total power
~6.25 W per FPGA
0% mult., 77% logic
34 filter instances, low Fmax
Results: Kalman Filter in ProcStar II
Altera EP2S180 FPGA Power Comparison (single FPGA)
7
Power Consumption (Watts)
6
5
4
Early Estimator Spreadsheet
PowerPlay
3
Measured Power*
2
1
0
Design 1
Design 2
Design 3
*Measured power is derived by subtracting baseline power consumption on ProcStar II board
from measured power consumption and dividing by 4
Power consumed from board components not accounted for, actual FPGA power consumption is lower
Questions?