mCafe Testing - Massey University
Download
Report
Transcript mCafe Testing - Massey University
Studies
in
Parallel
&
Distributed
Systems
–
159.735
Parallel Computing Using FPGA (Field Programmable Gate Arrays)
Sohaib Ahmed
15th May, 2009
Outlines
FPGAs and their internal structures
Why use FPGAs for parallel computing ?
Types of FPGAs
Application Examples and Processing in Applications
FPGAs in Parallel Computing
FPGA Limitations
Design Methods for FPGAs
Conclusion
FPGAs - Introduction
Ross Freeman, one of the Xilinx founder (www.xilinx.com) invented FPGAs in mid1980s
Other vendors include Altera, Actel, Lattice Semiconductor and Atmel
Support the notion of reconfigurable computing
Reconfigurable Computing
Use of multiple reconfigurable devices (such as FPGAs) and multiple
microprocessors
Processor(s) execute sequential and non-critical code while reconfigurable fabric
(FPGAs) performed that code which can be mapped efficiently to hardware
FPGAs Internal Structure
A semiconductor device consisting of :
Configurable Logic Blocks (CLBs)
Input/Output (I/O) Blocks (IOBs)
Static RAM (SRAM) Blocks
Digital Signal Processing Blocks (DSPBs)
Why using FPGAs ?
Speed up
Technology
Clock Speed
Time Taken
66 MHz
0.36 ms
Hardware is faster than software [1]
XV2V6000
FPGA
Optimized
Software
2.6 GHz
196.71 ms
FPGAs can support thousand-fold parallelism especially for low-precision computations
Cost
Development cost is much less than ASIC (Application-specific integrated circuits) for
lower volumes
Flexibility
FPGAs are flexible as compare to ASIC as they can be reprogrammable
Types of FPGAs
CPLDs ( Complex Programmable Logic Devices)
Requires voltage levels that are not usually present on computer systems
Anti-fuse based devices
Program only once
Static-RAM-Based Services
Can be programmed while the device is running
Application Examples
Virtex-II Pro
Virtex-4
Xilinx Devices
Recent success of FPGA in Tsubame Cluster in
Tokyo
Improved performance by additional 25%
Processing in Applications
[2]
FPGAs in Parallel Computing
Dynamic matching of a node to the computational requirement of an application
Application specific computers become more flexible
Enables the support of multi modes of parallel computing : MIMD, SIMD etc
Partial reconfiguration can allow better hardware resource utilization
Can extend dynamic task allocation scheme to allow for dynamic hardware allocation
Support for variable grain size
FPGAs Limitations
Capacity
Logic blocks have not dense representation as instructions have
Conventional processor run 90 % of code that takes 10 % of execution time
Reconfigurable logic takes 10 % of code that takes 90 % of execution time
Tools
Compilers for reconfigurable logic are not very good
Some operations are hard to implement on FPGAs like random access and pointerbased data structures
Design Methods for FPGA
[3]
Use an algorithm optimal for FPGAs
Systolic arrays for correlation are efficient
Use a computing mode appropriate for FPGAs
Streaming, systolic, arrays of fine-grained automata preferable
Searching biomedical databases for similar sequences
Use appropriate FPGA structures
Analyzing DNA or protein sequences
A straightforward systolic array
Design Methods for FPGA
[3]
Living with Amdahl’s Law
Speeding up an application significantly through an enhancement requires most of the
application to be enhanced
NAMD & ProtoMol framework was designed for computational experimentation
Hide latency of independent functions
Latency hiding is a basic technique for achieving high performance in parallel
applications
Functions on the same chip to operate in parallel
Use rate-matching to remove bottlenecks
Function level parallelism is built in
Design Methods for FPGA
[3]
Take advantage of FPGA-specific hardware
Hard-wired components such as integer multipliers and independently accessible
BRAMs (Block RAMs)
Xilinx VP100 has 400 independent accessible, 32-bit quad-ported BRAMs can help in
achieving 20 Terabytes per sec at capacity
Use appropriate arithmetic precision
Use appropriate arithmetic mode
Minimize use of high-cost arithmetic operations
Current Progress in Hardware & Software
SRC-6 and SRC-7 are parallel architectures in which cross bar switch that can be
piled for scalability
High performance computing vendors like Silicon Graphics Inc. (SGI), Cray and Linux
Networx incorporated FPGAs in their parallel architectures [4]
VHDL, Verilog are used to create hardware kernel
Other hardware description languages like Carte C, Carte Fortran, Impulse C, Mitrion
C and Handel-C are used.
Annapolis Micro Systems’ CoreFire, Starbridge Systems’ Viva, Xilinx System
Generator and DSPlogic’s reconfigurable computing toolbox are the high-level
graphical programming development tools [5]
Conclusion
Using FPGAs in Parallel computing offer following benefits :
Application acceleration
Flexibility in terms of application domain
Potential cost benefits over ASICs
The ability to exploit variable levels and modes of parallelism
More effective use of hardware resources
References
[1]
Todman,T.J,Constantinides, G.A, Witon, S.J.E, Mencer,O., Luk,W. & cheung, P.Y.K (2005) Reconfigurable
computing : architectures and design methods
[2] Altera Cooperation White Paper (2007). Accerating high performance computing with FPGAs. October 2007
[3] Herbordt, M.C., VanCourt, T., Yongfeng, G., Shukhwani, B., Conti,A., Model,J. & Disabello,D. (2007). Achieving
high performance with FPGA-Based computing
[4] Buell, D., El-Ghazawi, T., Gaj,K.,& Kindratenko,V. (2007). High-Performance reconfigurable computing. IEEE
Computer Society, March, 2007
[5] El-Ghazawi, T., El-Araby,E., Miaoqing Huang, Gaj,K., Kindratenko, V.,& Buell, D. (2008).The promise of highperformance reconfigurable computing. IEEE computer society, February, 2008 pp. 69 -76.
Any Questions ?
Thank You