Packet Switched, General Purpose FPGA

Download Report

Transcript Packet Switched, General Purpose FPGA

FPGA based instrumentation for
Correlators, Spectrometers, and VLBI
(how to build eight radio astronomy instruments in two years)
Dan Werthimer
University of California, Berkeley
http://seti.berkeley.edu
Our research group is really 3 groups
• SETI
(plus primordial black holes, HI mapping)
• Public Participation Scientific Computing
• CASPER – Center for Astronomy Signal
Processing and Electronics Research
UC Berkeley SETI Programs
Name
Time Scale
Search Type
SERENDIP
seconds
radio sky survey
SETI@home
mS - seconds
radio sky survey
Astropulse
nS - mS
radio sky survey
SEVENDIP
nS
visible targetted
SPOCK
1000 seconds
visible targetted
DYSON
IR targetted
The SETI@home Client
SETI@home Statistics
TOTAL
RATE
5,464,550 participants
(in 226 countries)
2,000 per day
2.3 million years
computer time
1,200 years per day
4*1021 floating point
operations
200 Tera-flops
Public Participation Supercomputing Group
David Anderson, Rom Walton, SETI Group
• aka Distributed Computing
• aka “edge resource aggregation”)
BOINC:
NSF
• Berkeley Open Infrastructure
for Network Computing
– General-purpose distributed
computing framework.
– Open source.
– Will make distributed computing
accessible to those who need it.
(Starting from scratch is hard!)
Projects
•
Astronomy
– SETI@home
(Berkeley)
– Astropulse
(Berkeley)
– Einstein@home: gravitational pulsar search (Caltech,…)
– PlanetQuest
(SETI Institute)
– Stardust@home (Berkeley, Univ. Washinton,…)
•
Earth science
– Climateprediction.net
•
(Oxford)
Biology/Medicine
– Folding@home, Predictor@home
(Stanford, Scripts)
– FightAIDSathome: virtual drug discovery
•
Physics
– LHC@home
•
(Cern)
Other
– Web indexing/search
– Internet Resource mapping (UC Berkeley)
Where's the computing power?
your computers
home PCs
academic
business
●
2010: 1 billion Internet-connected PCs
●
55% privately owned
●
If 100M participate:
–
100 PetaFLOPs, 1 Exabyte (10^18) storage
CASPER:
Center for Radio Astronomy Signal Processing and Electronics Research
Henry Chen, Daniel Chapman, Pat Crescini, Pierre Droz, Kirsten Meder,
Vinayak Nagpal, Arash Parsa, Aaron Parsons, Andrew Siemion, Dan Werthimer
Radio Astronomy Lab:
Don Backer, Paul Demorest, Matt Dexter,
Carl Heiles, David McMahon, Mel Wright, Lynn Urry
Berkeley Wireless Research Center:
Bob Broderson, Chen Chang, John Wawrzynek
SETI Institute:
Dave Deboer, Gerry Harp
Collaborators:
Jeff Mock, NAIC, NRAO, ATNF, JPL/DSN, Harvard/Smithsonian/CFA,
MIT/Haystack, GMRT, Caltech, South Africa KAT
CASPER Real-time Signal Processing Instrumentation
(NSF ATI, MRI)
• Low NRE, shared by the community
• Rapid development (8 instruments / 2 years)
• Open-source, collaborative
• Reusable, platform-independent gateware
• Modular, upgradeable hardware
• Industry standard communication protocols
• Low Cost
MOTIVATION
ATA, SKA, Focal Plane Arrays, SETI,
need >> PetaOp/sec
Instruments take a long time to
build, very high NRE
The Radio Revolution
Allen Telescope Array
•6.1-meter offset Gregorian
(2.4-meter secondary)
ATA-42 Operational This Fall
The Problem with the Current
Hardware Development Model
• Takes 5 years
• Cost Dominated by NRE because of
custom Boards, Backplanes, Protocols
• Antiquated by the time it’s released.
Solution:
• Modular Hardware
– Low number of board designs
– Can be upgraded piecemeal or all together
– Reusable
– Standard signal processing model which
is consistent between upgrades.
Solution: use FPGA’s
1 FPGA = 100 Pentium, 1/500 the power per op
Computational Density Comparison
Moores
Law for
FGPA’s
Processor Peak
1000000
FPGA 32-bit int MAC
100000
10000
1000
10/28/19 3/11/199 7/24/199 12/6/199 4/19/200 9/1/2002 1/14/200
95
7
8
9
1
4
3X improvement
per year!
FPGA maximum sustained performance
100000
Release Date
10000
MOPS (32 bit MAC)
(MOPS/MHz)*lamda^2
10000000
1000
100
10
1
12/1/19 6/19/19 1/5/199 7/24/19 2/9/199 8/28/19 3/15/20 10/1/20 4/19/20 11/5/20 5/24/20
96
97
8
98
9
99
00
00
01
01
02
Release date
Compute Module Diagram
4GB DDR2 DRAM
12.8GB/s (400DDR)
Memory
Controller
FPGA
Fabric
FPGA
Fabric
DR
0D
0
3
it @
b
64
MGT
5 FPGAs
2VP70FF1704
IB4X/CX4
40Gbps
MGT
MGT
DRAM
DRAM
DRAM
Memory
Controller
IB4X/CX4
20Gbps
100BT
Ethernet
DRAM
DRAM
DRAM
DRAM
DRAM
IB4X/CX4
40Gbps
FPGA
Fabric
Memory
Controller
DRAM
DRAM
DRAM
DRAM
IB4X/CX4
40Gbps
IB4X/CX4
40Gbps
FPGA
Fabric
MGT
MGT
FPGA
Fabric
138 bits 300MHz DDR 41.4Gb/s
Memory
Controller
Memory
Controller
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
Platform-Independent,
Parameterized Gateware
• What is Gateware?
– Design logic of FPGAs
(between hardware and software)
• Need libraries for signal processing
which don’t have to be rewritten
every hardware generation.
• Matlab Simulink!
Biplex Pipelined FFT
• Uses 1/6 the resources of the Xilinx module.
FFT controls
Simulink Library – Aaron Parsons
Verilog Library – Jeff Mock
• Transform length
• Bandwidth
• Complex or Real
• Number of Polarizations
• Input bit width and output bit width
• twiddle coefficient bit width
• Run-time programmable down-shifting
• Decimate option
Filter Response:
PFB vs. FFT
PFB vs. FFT
Additional PFB controls
(Aaron Parsons, Jeff Mock)
• Filter overlap
• Width of filter coefficients
• Window function for filter (hamming, hanning, etc.)
• Import filter coefficients for custom filter performance
Digital Down-Converter
• Selectable # of FIR taps
• On-the-fly programmable mix frequency
• Selectable FIR coeff
• Agile sub-band selection.
X-Engine Correlation Architecture
(Lynn Urry, Aaron Parsons)
X-Engine Architecture:
applied to an arbitrary sized
antenna array
Hardware and Software Libraries
legend:
Applications
Global Interconnects
• Commercial 10GBe switch
from HP, Fujitsu, Foundry,
Extreme Networks, Force 10
– Packet switched, nonblocking
Ethernet Switch
Compute
Node
#1
Compute
Node
#N
– <= 224 ports (4X) per
chassis
– Up to 10,000 ports in a
system
– 200~1000 ns switch latency
– 400~1200 ns FPGA to FPGA
latency
– ~ 2.88Tbps full duplex
constant cross section
bandwidth
– $600 per port
Infiniband Crossbar Switch
Beowulf Cluster Like General Purpose Architechture
Dynamic Allocation of Resources, need not be FPGA based
Reconfigurable
Compute Cluster
Polyphase
Filter Banks
ADC
ADC
FPGA DSP
Module
PFB
FPGA DSP
Module
PFB
.
.
.
Correlator
FPGA DSP
Module
Commercial off-the-shelf
Multicast 10 Gbps (10GE
or InfiniBand) Switch
.
.
.
FPGA DSP
Module
FPGA DSP
Module
FPGA DSP
Module
.
.
.
ADC
PFB
General-purpose CPUs
Beamformers/
Spectrometers
Pulsar timer
.
.
.
Targeted Applications
• Moderate to high-bandwidth problems
– For low bandwidths, just use CPUs
• Lower to mid-scale computation
– For very large applications (SKA), may be
more cost effective to design ASICs
• Rapid Development
Applications
• VLBI Mark 5B data recorder - Haystack – 500 MHz
• Beamforming – SMA –
Vinayak Nagpal, Jonathan Weintroub
• SETI – Arecibo (UCB)
JPL/UCB DSN
(Preston, Gulkis, Levin, Jones)
• Correlators and Imagers:
ATA (Mel Wright)
Reionization Experiment (Backer, Bradley…)
Carma Next Gen (Dave Hawkins, Caltech)
SKA demonstrator South Africa (Justin Jonas)
VLBI Digitizer-Channelizer for Mark5
Haystack: Shep Doeleman, Brian Fanous,
Alan Rogers, Alan Whitney
UCB: Henry Chen, Aaron Parsons, Pierre Droz
• Interfaces to MARK 5 data recorder
• 500 MHz bandwidth * 2 IF’s
(Only 1 IF now)
• 16 or 32 channels per IF
• Polyphase Filter Bank
VLBI Mark 5B Front End
500 MHz BW, 32 channel filter bank
1 GHz bandwidth
“Pocket Spectrometer”
• Using ATMEL ADC’s at 2 Gsamples/sec
• Performing 4 real FFT’s in 1 (complex) biplex
pipelined FFT module.
• 2048 channels
• Uses just 1 ADC, 1 IBOB, and your laptop.
128 Million Channel SETI
Spectrometer
• 200 MHz Bandwidth, 2 Hz resolution
Multi-Purpose Spectrometer – Low Bandwidth
Aaron Parsons
200 Aux. I/O
I
200 Mhz
ADC
Q
200 Mhz
ADC
{
Pol. 1
I
Pol. 2
Xilinx
Virtex-II 6000
FPGA
200 Mhz
ADC
{
Q
Xilinx
Virtex-II
1000
FPGA
Compact PCI
Backplane
200 Mhz
ADC
256 MB DRAM
Software
SERENDIP V Spectrometer
SETI Applications
•
JPL/UCB/SI DSN Sky Survey (20 GHz Bandwidth)
•
Parkes Southern SERENDIP
•
ALFA Sky Survey (300 MHz x 7 beams)
•
SETI Italia (Bologna)
•
SETI@home
Astronomy Applications
•
GALFA Spectrometer – Arecibo Multibeam Hydrogen Survey
•
Astronomy Signal Processor – ASP – Don Backer, Ingrid Stairs, et al(pulsars)
•
ATA4 Correlator F Engine
•
Reionization Experiments (Don Backer, Rich Bradley, Chippendale, Ekers)
•
Antenna Holography, ATNF, China
•
GMRT correlator
Astronomy Signal Processor: Don Backer, Jeff Mock, Paul Demorest
128 MHz
Pol. 1
128 MHz
Pol. 2
SERENDIP V
Polyphase
Filter
Bank
Server
GbE
w/ EDT card Switch
PC PC P
PC PC P
Server
GbE
w/ EDT card Switch
PC PC P
PC PC P
Server
GbE
w/ EDT card Switch
PC PC P
PC PC P
Server
GbE
w/ EDT card Switch
PC PC P
PC PC P
GALFA Spectrometer
Quadrature
Downconverter
Board
LPF
Multipurpose Spectrometer Board
-50 to +50 MHz

IF Pol. 1
100 MHz
sin
LPF
Biplex
256 pnt.
PFB

FIR Decimate
LPF by 16
cos
LPF
IF Pol. 2
100 MHz
sin
e^-it
LPF
cos
-50 to +50 MHz
e^-it
12.5 Mhz
Digital
FIR Decimate
LPF by 16
Stokes

Stokes

Biplex
8192 pnt.
PFB
cPCI
Backplan
e
to
CPU
Mars Orbiter mm Spectrometer
ASIC based spectrometer (mars)
• 2W/ADC + 2W/ASIC = 4 Watts
• Use UCB’s “Chip in a Day” software
(compiles FGPA code into ASIC)
Use rad hard libraries from LBL
1960 – First Radio Astronomy Digital Correlator
21 lags
300kHz clock
discrete transistors
$19,000
Sandy
Weinreb
Correlator processing power
107
SKA
EVLA
106
GFlops
ALMA
LOFAR
SMA
105
109
.
EVN/WSRT
104
103
106
VLA
DXB
102
DLB
DAS
DCB
10
1
103
70
75
80
85
90
95
2000
05
10
2015
source: Arnold van Ardenne
Future SETI Spectrometers
2015
4 THz
2020
128 THz
400 beams
10 GHz each
12,800 beams
2025
4000 THz
40,000 beams
2030
128,000 THz 1M beams
Caveats
• Risky
• Simulink new, buggy, not open source
(verilog, vhdl old)
just a bunch of clever students,
We’ve built the easy instruments so far,
(Not the hard ones), yet to demonstrate packetized
Correlator and compute cluster
CASPER the Friendly...
• Group Helping Open-source Signalprocessing Technology (GHOST?)
– Goal to help develop signal processing
instrumenation and libraries for the
community.
– Open source hardware, gateware, and
software.
– Provide training and tutorials
– Not so much delivering turn-key instruments.
Selected correlator quotes
Sandy Weinreb
“In 1960 there were no chips; just discrete transistors!
The $19,000 was the cost of the samplers, shift registers, and
counter. It did not include the cost of the 21 accumulators which I
made myself in a few months getting paid $240/month.”
Ray Escoffier
“With correlator performance having gone up by a factor of
922,000 over the last 30 years, its only fair that correlator
design engineers' salaries should have gone up by a similar
factor!!”
Sergei Pogrebenko
“It is desirable that the output data rate from a data processor is
less than the input data rate.”
http://seti.berkeley.edu