ECE506 Week 1

Download Report

Transcript ECE506 Week 1

ECE 506
Reconfigurable Computing
http://www.ece.arizona.edu/~ece506
Lecture 3
Reconfigurable Architectures
Ali Akoglu
Complex Programmable Logic Device
° Hierarchical design against size explosion of PLAs
• Combinational logic with Flip Flops (registered output)
• Organized into logic blocks connected in an interconnect matrix
• Usually enough logic for simple counters, state machines, decoders, etc.
Xilinx CoolRunner II CPLD
° PLA and Macrocell combination
° 1.8V device, estimated power consumption of less than
100 micro amps
° Up to 12,000 gates, 512 MacroCells
CPLD
° Multiple Function Blocks (FBs) and I/O Blocks (IOBs)
• Fully interconnected (FB outputs and input signals to the FB Inputs)
• Each FB provides programmable logic 54 inputs,18 outputs.
° The IOB provides buffering for device inputs and outputs.
° Output enable signals drive directly to the IOBs.
Function Block
° Comprised of 18 independent macrocells,
• Each can implement a combinatorial or registered function.
° Logic within the FB is implemented using a sum-ofproducts representation.
• Fifty-four inputs (108 true and complement signals) into the
programmable AND-array to form 90 product terms.
• Any number of these product terms, can be allocated to each
macrocell by the product term allocator.
How many
product terms
would you assign
for each
Macrocell?
Macrocell
Product Term Allocator selects: 5
product terms primary data inputs to
the OR gate for combinatorial
functions, as control inputs (clock,
clock enable, set, reset, output en.)
configured for a
combinatorial or
registered
function.
Product Term Allocator
° Controls how the five direct product terms are assigned to each MC.
• For example, all five direct terms can drive the OR function .
Product Term Allocator
° Can re-assign other product terms within
the FB to increase the logic capacity of a
macrocell beyond five direct terms.
° Any macrocell requiring additional
product terms can access uncommitted
product terms in other macrocells within
the FB.
° Up to 15 product terms can be available
to a single macrocell with only a small
incremental delay (tPTA)
Product Term Allocator
Product Term Allocator
° Can re-assign product terms from any
macrocell within the FB by combining
partial sums of products over several
macrocells
• What is the incremental delay in this
example
2tPTA
If all 90 product terms are available
to any macrocell, what is the
maximum incremental delay?
Programmability Options
° PLDs, CPLDs have different types of programmability.
• initial programming and reprogramming
° One-time programmable:
•
•
•
•
device is programmed once and holds its programming "forever"
usually uses fuses to make/break links
not reusable, but usually the cheapest
discard device if changes are to be made
Programmability Options
° UV-Erasable (EPROM)
• a floating gate positioned between regular MOS transistor control gate and the
channel.
• floating gate is uncharged
° To program the cell:
• a high voltage (e.g. 14 volts) applied to the control gate (drain is at ~12 volts).
• causes current to flow between the source and drain.
• accelerates electrons to high velocity and a small fraction of them traverse the
thin oxide and become trapped on the floating gate.
• floating gate, surrounded by an insulating layer, becomes “permanently”
negatively charged and the transistor is permanently turned off.
° “Permanent” means about 10 years at 125 degrees C; at
higher temperatures this time is reduced.
° Cells erased by Ultra-Violet (UV) light.
• electrons on floating gates are excited and discharged to the substrate.
Programmability Options
° Electrically Erasable (EEPROM)
• uses a floating gate structure with a control gate on top.
• both erasing and reprogramming is accomplished with an
electrical current
• device can be programmed/erased on circuit board, no special
packaging or IC socket is needed
• erase time is much faster than UV erase
• programming retained after power down
- non-volatile
• programming/erasing limited to 1000s of cycles
Programmability Options
° Electrically Erasable:
• both erasing and reprogramming is accomplished with an
electrical current
• device can be programmed/erased on circuit board, no special
packaging or IC socket is needed
• erase time is much faster than UV erase
• programming retained after power down
- non-volatile
• programming/erasing limited to 1000s of cycles
Electrically Erasable PLDs
° Conventional PLDs are either
• One-time programmable
• UV Erasable
° Must be placed in a programmer to program them
° EE PLDs can be programmed and erased in place
• A small (four wire) connection to a computer is needed
• Once programmed, will retain program indefinitely
• Never have to take the chip out of its circuit
FPGA
° Introduced in 1985 by Xilinx
° Similar to CPLDs
° A function to be implemented in FPGA
• Partitioned into modules , each implemented in a logic block.
• Logic blocks connected with the programmable interconnection.
FPGA Technology
° 1) Antifuse-based
• Realization of interconnections
° 2) Memory-based.
• realization of interconnections and computation
• FLASH, SRAM
FPGA Technology
° Antifuse FPGAs:
• configured by burning a set of fuses.
• once configured, cannot be altered any more
• bug fixes and updates possible for new PCBs, but hardly for already
manufactured boards.
• ASIC replacement for small volumes.
° Flash FPGAs
• may be re-programmed several thousand times and are non-volatile
• Expensive, re-configuration takes several seconds
° SRAM FPGAs
• dominating technology
• unlimited re-programming
• additional circuitry is required to load the configuration into the FPGA
after power on
• re-configuration is very fast,
• Some devices allow even partial re-configuration during operation
Antifuse (Actel FPGA)
° An antifuse is normally an open circuit.
° Two-terminal elements connected to upper and lower
layer of the antifuse, in the middle is a dielectric (OxygenNitrogen-Oxygen, ONO) layer
° Initial state:
• High resistance of dielectric does not allow any current to flow.
° Applying a high voltage:
• causes large power dissipation and melts the dielectric
• Drastically reduces the resistance
• a link can be built, which permanently connects the two layers.
Antifuse chips
° Advantage !
• Small area
-
With metal-to-metal anti-fuses, no silicon area is required to make
connections, decreasing the area overhead of programmability.
• Much lower resistance and parasitic capacitance over transistors.
-
possible to include more switches per device
-
reduces the RC delays in the routing.
• No bitstream can be intercepted in the field (no bitstream transfer)
-
Need a Scanning Electron Microscope to try to know antifuse
states (an Actel AX2OOO antifuse FPGA contains 53 million
antifuses with only 2-5% programmed in an average design)
• Interconnect structure is naturally “rad hard,”
-
relatively immune to the effects of radiation (except flip-flops!),
-
SRAM-based component can be “flipped” if hit by radiation
Antifuse chips
° Disadvantage !
• not suitable for devices that must be frequently reprogrammed
• one-time programmable FPGAs.
• special programmers must be used to program a device before it is
mounted on a final product
• involves significant changes to the properties of the materials in the
fuse,
- leads to scaling challenges when new IC fabrication processes
are considered
Programmability Options
° Static Random Access Memory (SRAM) Programming:
• Switch is a pass transistor controlled by the state of the SRAM bit
• Logic block configuration bits are stored in SRAM
• can be reprogrammed infinite number of times
• use of standard CMOS process technology
-
SRAM cells are created using exactly the same CMOS
technologies as the rest of the device,
-
No special processing steps are required in order to create
these components.
-
benefit from the increased integration, higher speeds and lower
dynamic power consumption of new processes with smaller
minimum geometries.
Programmability Options
° SRAM Volatility
• programming contents NOT retained after power down
• external non-volatile memory device required on power up
° SRAM Size
• SRAM cell requires either 5 or 6 transistors and the
programmable element used to interconnect signals requires at
least a single transistor.
° SRAM Security
• Since the configuration information must be loaded into the
device at power up, there is the possibility that the configuration
information could be intercepted and stolen for use in a
competing system.
Programmability Options
° Flash Programming:
• alternative that addresses some of the shortcomings of SRAM
° Use of floating gate programming technologies
• inject charge onto a gate that “floats” above the transistor.
° Non-volatile
• eliminates the need for the external storage for configuration data
• can function immediately upon power-up
° Area efficiency
• Area overhead: The programming circuitry (high and low voltage buffers)
needed to program the cell,
• Cost is relatively modest as it is amortized across numerous
programmable elements.
Programmability Options
° Cannot be reprogrammed an infinite number of times.
• Charge buildup in the oxide eventually prevents a flash-based device
from being properly erased and programmed
° Non-standard CMOS process.
• around five additional process steps on top of standard CMOS
• behind SRAM-based devices by one or more generations.
° Programming time is about three times that of an SRAMbased component.
° High resistance and capacitance due to the use of
transistor-based switches.
° Solution: on-chip flash memory to provide non-volatile
storage with SRAM cells to control the programmable
elements in the design.
Programmability Options
° An ideal technology
•
•
•
•
non-volatile
reprogrammable
using a standard CMOS process
offer low on resistances and low parasitic capacitances.
FPGA Components
° How can we implement any circuit in an FPGA?
• Example: Half adder
-
Combinational logic represented by truth table
What kind of hardware can implement a truth table?
Input
Out
Input
Out
A
B
S
A
B
C
0
0
0
0
0
0
0
1
1
0
1
0
1
0
1
1
0
0
1
1
0
1
1
1
FPGA Components
° Lookup Table (LUT)
° Implement truth table in
small memories (LUTs)
• Usually SRAM
° A function is implemented
by writing all possible
values that the function
can take in the LUT
A
B
S
A
B
C
0
0
0
0
0
0
0
1
1
0
1
0
1
0
1
1
0
0
1
1
0
1
1
1
0
2-input,
1-output
LUTs
00
00
Addr
Addr
° The inputs values are
used to address the LUT
and retrieve the value of
the function corresponding
to the input values
0
A
1
01
A
0
01
B
1
10
B
0
10
0
11
1
Output
Output
S
11
C
FPGA Components
° Alternatively, could have used a 2-input, 2-output LUT
• Outputs commonly use same inputs
0
0
00
Addr
Addr
A
1
01 A
0
01
B
1
10 B
0
10
0
11
S
0
0
00
A
1
0
01
B
1
0
10
0
1
S
C
00
1
Addr
11
C
11
FPGA Components
° Slightly bigger example: Full adder
• Combinational logic can be implemented in a LUT with same
number of inputs and outputs
-
3-input, 2-ouput LUT
3-input, 2-output LUT
Truth Table
Inputs
Outputs
0
0
A
B
Cin
S
Cout
A
1
0
0
0
0
0
0
B
1
0
0
0
1
1
0
0
1
1
0
0
1
0
1
1
1
S
Cout
0
1
0
1
0
0
1
1
0
1
1
0
0
1
0
1
0
1
0
1
1
1
0
0
1
1
1
1
1
1
Cin
FPGA Components
° LUT Example: Implement the function
ABD+BCD+ABC
• 2-input LUTs
• 3-input LUTs
• 4-input LUTs
FPGA Components
° LUTs are used as function generators
° How many SRAM locations does a k-input LUT have?
° How many different functions can a k-input LUT 22
implement?
Addr
2k
k
0
0
00
A
1
0
01
B
1
0
10
0
1
S
C
11
FPGA Components
° Why aren’t FPGAs just a big LUT?
° Size of truth table
• grows exponentially based on # of inputs
• 3 inputs = 8 rows, 4 inputs = 16 rows, 5 inputs = 32 rows, etc.
• Same number of rows in truth table and LUT
• LUTs grow exponentially based on # of inputs
° Number of SRAM bits in a LUT
• = 2i * o
• i = # of inputs, o = # of outputs
• Example: 64 input combinational logic with 1 output would
require 264 SRAM bits
-
1.84 x 1019
° Clearly, not feasible to use large LUTs
• So, how do FPGAs implement logic with many inputs?
FPGA Components
° Fortunately, we can map circuits onto multiple LUTs
• Divide circuit into smaller circuits that fit in LUTs (same # of
inputs and outputs)
• Example: 3-input, 2-output LUTs
FPGA Components
° Large LUTs
• Fast when using all inputs
• Wastes transistors otherwise
° Must also consider total chip area
• Wasting transistors may be ok if there are plenty of LUTs
FPGA Components
° What if circuit doesn’t map perfectly?
• More inputs in LUT than in circuit
- Truth table handles this problem
• More outputs in LUT than in circuit
- Extra outputs simply not used
– Space is wasted, so should use multiple outputs
whenever possible
° Important Point
• The number of gates in a circuit has no effect on the mapping
into a LUT
- All that matters is the number of inputs and outputs
- Unfortunately, it isn’t common to see large circuits with a
few inputs
1 gate
1,000,000 gates
FPGA Components
° LUT-Realization
° A LUT is basically a multiplexer that evaluates the truth
table stored in the configuration SRAM cells (can be seen
as a one bit wide ROM).
° QUIZ2
FPGA Components
° Example:
• Determine best LUTs for following circuit
-
Choices
–
–
-
4-input, 2-output LUT (delay = 2 ns)
6-input, 2-output LUT (delay = 3 ns)
Assume each SRAM cell is 6 transistors
–
–
4-input LUT = 6 * 24 * 2 = 192 transistors
6-input LUT = 6 * 26 * 2 = 384 transistors
FPGA Components
° Example:
• Determine best LUTs for following circuit
-
Choices
–
–
-
Assume each SRAM cell is 6 transistors
–
–
6-input
LUT
4-input, 2-output LUT (delay = 2 ns)
6-input, 2-output LUT (delay = 3 ns)
4-input LUT = 6 * 24 * 2 = 192 transistors
6-input LUT = 6 * 26 * 2 = 384 transistors
Propagation delay = 3 ns
Total transistors = 384
FPGA Components
° Example:
• Determine best LUTs for following circuit
-
Choices
–
–
-
Assume each SRAM cell is 6 transistors
–
–
4-input
LUT
4-input, 2-output LUT (delay = 2 ns)
6-input, 2-output LUT (delay = 3 ns)
4-input LUT = 6 * 24 * 2 = 192 transistors
6-input LUT = 6 * 26 * 2 = 384 transistors
Propagation delay = 4 ns
Total transistors = 384 transistors
6-input LUTs are 1.3x
faster and use same area
FPGA Components
° Problem: How to handle sequential logic
• Truth tables don’t work
° Possible solution:
• Add a flip-flop to the output of LUT
° BLEs: the basic logic element
• Circuit can now use output from LUT or from FF
• Where does select come from?