ENGIN112 - lecture 2

Download Report

Transcript ENGIN112 - lecture 2

ECE 697F
Reconfigurable Computing
Lecture 1
Course Introduction
Prof. Russell Tessier
Lecture 1: Course Introduction
September 8, 2004
What is Reconfigurable Computing?
• Computation using hardware that can adapt at the
logic level to solve specific problems
° Why is this interesting?
• Some applications are poorly suited to
microprocessor.
• VLSI “explosion” provides increasing resources.
• Hardware/Software
• Relatively new research area.
° Acknowledgement: Wolf text
Lecture 1: Course Introduction
September 8, 2004
Background needed
• Basic VLSI – transistors, delay models.
• Basic algorithms – graph algorithms, seaches
• Computer Architecture – ALU, microprocessor
• Digital Design – adder, counter, etc.
Topic self-contained!
Lecture 1: Course Introduction
September 8, 2004
Course Organization
• Two lecture a week (about 27 overall)
• 3 Homework assignments (20%)
• Final project (35%)
• Transcription assignment (20%)
• Mid-term (25%)
• Required text – FPGA-based System Design (Wolf)
Lecture 1: Course Introduction
September 8, 2004
Microprocessor-based Systems
Data Storage
(Register File)
A
B
C
ALU
64
• Generalized to perform many functions well.
• Operates on fixed data sizes.
• Inherently sequential.
Lecture 1: Course Introduction
September 8, 2004
Reconfigurable Computing
A
B
If (A > B) {
H = A;
Functional
L = B;
Unit
}
Else {
H
L
H = B;
L = A;
• Create specialized hardware for each application.
}
• Functional units optimized to perform a special task.
Lecture 1: Course Introduction
September 8, 2004
Example: Bubblesort
A
B
A
B
H
L
H
L
A
B
A
B
H
L
H
L
A
B
H
L
Smallest
Largest
• Adapt interconnect to problem.
• Take advantage of parallelism.
Lecture 1: Course Introduction
September 8, 2004
Implementation Spectrum
Microprocessor
Reconfigurable
Hardware
ASIC
• ASIC gives high performance at cost of inflexibility.
• Processor is very flexible but not tuned to the application.
• Reconfigurable hardware is a nice compromise.
What does it look like?
Lecture 1: Course Introduction
September 8, 2004
Moore’s Law
° Gordon Moore: co-founder of Intel.
° Predicted that number of transistors per chip
would grow exponentially (double every 18
months).
° Exponential improvement in technology is a
natural trend: steam engines, dynamos,
automobiles.
Lecture 1: Course Introduction
September 8, 2004
Moore’s Law plot
Lecture 1: Course Introduction
September 8, 2004
The cost of fabrication
° Current cost: $2-3 billion.
° Typical fab line occupies about 1 city block,
employs a few hundred people.
° New fabrication processes require 6-8 month
turnaround.
° Most profitable period is first 18 months-2 years.
Lecture 1: Course Introduction
September 8, 2004
Reconfigurable Hardware
Logic Element
A
B
C
D
A
Out
B
C
D = out
• Each logic element operates on four one-bit inputs.
• Output is one data bit.
• Can perform any boolean function of four inputs
2
2
4
= 64K functions!
Lecture 1: Course Introduction
September 8, 2004
Field-Programmable Gate Array
Tracks
Logic Element
LE
LE
LE
LE
LE
LE
LE
LE
LE
LE
LE
LE
• Each logic element outputs one data bit.
• Interconnect programmable between elements.
• Interconnect tracks grouped into channels.
Lecture 1: Course Introduction
September 8, 2004
FPGA Architecture Issues
Logic
Element
•
•
•
•
Need to explore architectural issues.
How much functionality should go in a logic element?
How many routing tracks per channel?
Switch “population”?
Lecture 1: Course Introduction
September 8, 2004
Real World Physical Issues
S
S
Wires have real cost
•
•
•
•
Modelling FPGA delay.
Improving performance through buffering/segmentation.
Technology dependent.
The cost of reconfigurability.
Lecture 1: Course Introduction
September 8, 2004
Translating a Design to an FPGA
C program
.
.
C = A+B
.
Circuit
A
B
+
Array
C
• CAD to translate circuit from text description to physical
implementation well understood.
• CAD to translate from C program to circuit not well understood.
• Very difficult for application designers to successfully write highperformance applications
Need for design automation!
Lecture 1: Course Introduction
September 8, 2004
High-level Compilers
• Difficult to estimate hardware resources.
• Some parts of program more appropriate for processor
(hardware/software codesign).
• Compiler must parallelize computation across many resources.
• Engineers like to write in C rather than pushing little blocks
around.
A
C = A+B
B
+
C
Lecture 1: Course Introduction
for (i = 0; i<n, i++)
{
.
.
}
September 8, 2004
Design abstractions
English
Executable
program
Sequential
machines
function
Logic gates
Lecture 1: Course Introduction
specification
behavior
Throughput,
design time
registertransfer
Function units,
clock cycles
logic
Literals,
logic depth
transistors
circuit
nanoseconds
rectangles
layout
microns
September 8, 2004
cost
Circuit Compilation
1. Technology Mapping
LUT
2. Placement
LUT
?
Assign a logical LUT to a physical location.
3. Routing
Select wire segments
And switches for
Interconnection.
Lecture 1: Course Introduction
September 8, 2004
Two Bit Adder
Made of Full Adders
A B
Co
FA
A+B = D
Ci
S
Logic synthesis tool reduces circuit to
SOP form
S = ABCi + ABCi + ABCi + ABCi
A
B
Ci
LUT
Co
A
B
Ci
LUT
Co = ABCi + ABCi + ABCi + ABCi
Lecture 1: Course Introduction
September 8, 2004
S
FPGA design
° FPGA manufacturer creates an FPGA fabric;
system designer uses the fabric.
° FPGA fabric design issues:
• Study sample user designs.
• Select interconnect topology.
• Create logic element structures.
• Design circuits, layout.
Lecture 1: Course Introduction
September 8, 2004
Why do we care about layout?
° We won’t design layout.
° Layout determines:
• Logic delay.
• Interconnect delay.
• Energy consumption.
° We want to understand sources of FPGA
characteristics.
Lecture 1: Course Introduction
September 8, 2004
Design validation
° Must check at every step that errors haven’t been
introduced-the longer an error remains, the more
expensive it becomes to remove it.
° Forward checking: compare results of less- and
more-abstract stages.
° Back annotation: copy performance numbers to
earlier stages.
Lecture 1: Course Introduction
September 8, 2004
Processor + FPGA
Three possibilities
daughtercard
Proc
FPGA
chip
Backplane bus
(e.g. PCI)
1. FPGA serves as coprocessor for data
intensive applications – possible project.
Proc
FPGA
chip
2. FPGA serves as embedded computer
for low latency transfer. “Reconfigurable Functional Unit”
Lecture 1: Course Introduction
September 8, 2004
Processor + FPGA (cont..)
3. Processor integration
Processor
RF
ALU
FPGA
• FPGA logic embedded inside processor.
• A number of problems with 2 and 3.
- Process technology an issue.
- ALU much faster than FPGA generally.
- FPGA much faster than the entire processor.
Lecture 1: Course Introduction
September 8, 2004
Multi-FPGA Systems
F
F
F
F
F
F
F
F
F
• Most applications don’t fit on one device.
• Create need for partitioning designs across many devices.
• Effectively a “netlist computer”
Each FPGA is a logic processor interconnected in a given topology.
Lecture 1: Course Introduction
September 8, 2004
Dynamic Reconfiguration
L
L
L
L
• What if I want to exchange part of the design in the device with
another piece?
• Need to create architectures and software to incrementally
change designs.
• Effectively a “configuration cache”
Examples: encryption, filtering.
Lecture 1: Course Introduction
September 8, 2004
Research Areas
• Storing configuration info inside device.
• Architecture evaluation.
- Size and performance tradeoff.
• Layout of a new logic element.
• Algorithm for place and route.
• Apply an application to FPGA logic.
Lecture 1: Course Introduction
September 8, 2004
Versatile Place and Route
• Written by Vaughn Betz at the University of Toronto
• Performs FPGA placement and routing.
• Written in C
• Runs on Suns, Alphas, Linux
• Estimates device sizes and performance.
Lecture 1: Course Introduction
September 8, 2004
Xilinx XC4000 Cell
Lecture 1: Course Introduction
• 2 4-input look-up tables
• 1 3-input look-up table
• 2 D flip flops
September 8, 2004
Xilinx XC4000 Routing
Lecture 1: Course Introduction
25
September 8, 2004
Altera Flex10K
Lecture 1: Course Introduction
September 8, 2004
Altera Flex10K
Lecture 1: Course Introduction
September 8, 2004
Xilinx Virtex-II Pro
Lecture 1: Course Introduction
September 8, 2004
Altera Stratix
Lecture 1: Course Introduction
September 8, 2004
Xilinx Virtex CLB
Lecture 1: Course Introduction
September 8, 2004
Embedded RAM
° Xilinx – Block SelectRAM
• 18Kb dual-port RAM arranged in columns
° Altera – TriMatrix Dual-Port RAM
• M512 – 512 x 1
• M4K – 4096 x 1
• M-RAM – 64K x 8
Lecture 1: Course Introduction
September 8, 2004
Xilinx: Embedded Multipliers
Lecture 1: Course Introduction
September 8, 2004
Summary
° Reconfigurable computing relies heavily on
new VLSI technology
° Device architectures maturing
° Application development progressing at rapid
pace
° Integration of hardware and software a difficult
challenge
° Active area of research at UMass.
Lecture 1: Course Introduction
September 8, 2004