Transcript Notes

Some Embedded Processor Alternatives;
Processors for this course: Introduction to Altera FPGAs
1
Processor Examples
Control
“Harvard architecture”:
--PIC processor family
Instruc.
I/O
Data
ALU
Control
“von Neumann architecture”:
I/O
--simple processor
Memory
Data +
Instruc.
ALU
--mP 3 processor (Hamblen et al., chapter 9)
--MIPS processor (Hamblen et al., chapter 14)
--NIOS II processor core (Hamblen et al., chapters 15-17)
2
PIC processor family: processor is fixed, developer programs it
Reference: http://en.wikipedia.org/wiki/PIC_microcontroller
• PIC: peripheral interface controller
• Originally (~1975) for offloading I/O functions from a CPU
• Harvard architecture: data and instructions (“code”) are
stored separately—thus a data item and an instruction do not
need to be the same length
“Code”
Data
(Instructions)
• Newer versions have a stack
• One accumulator (referred to as W), but memory is usually
referred to as a “register file”
• Some versions allow a type of indirect addressing
• Usually referred to as a RISC machine; may have up to 70
instructions
• May be able to access external memory (newer versions)
3
• Many development tools & languages available
parallel port
VGA port
UP3 BOARD
PS2 port
Altera
Cyclone chip
USB port
SRAM
serial port
FLASH
invalid input
voltage LED
on/off switch
user-definable
pushbuttons
user-definable
LEDs
power
user-definable
DIP switches
+3.3V
supply
LED
global reset
LC Display
http://users.ece.gatech.edu/~hamblen/UP3/ and
http://users.ece.gatech.edu/~hamblen/UP3/UP3%20Reference%20Manual.pdf
+5V
supply
LED
Some processor architectures:
4
simple processor:
•Von Neumann
architecture
•Only one general
purpose register
(accumulator)
•Supports direct,
indirect, and indexed
addressing
M
MA
IR
AC
CF
MD
IA
ABUS
IB
PC
BBUS
ALU
•Small instruction set, 2
formats (000-110 or 111)
•Primitive I/O (via
accumulator)
•No built-in stack / stack
pointer
•No ability to do virtual
storage
ALU OUTPUT
OBUS
M: memory
MA: memory address register
MD: memory data register
IR: instruction register
AC: accumulator
CF: carry flag
IA, IB: index registers
PC: program counter
5
mP 3 processor (Hamblen et al., chapter 9)
•Similar to simple processor—von Neumann architecture, 1
accumulator
•Implementation uses < 1% of Altera Cyclone device logic
•Memory and I/O are now each components on the data bus;
all info goes through MDR (fig. 9-1)
•8-bit instructions, 8-bit data in 1 16-bit word, several formats
•Only direct addressing
•Only 5 instructions given (load, store, add, jump, jneg)—can
these support general-purpose computing?
•No stack pointer
•Can it do virtual storage?
6
MIPS processor (Hamblen et al., chapter 14)
•Widely-used RISC architecture, 1980’s
•32-bit instructions, 3 formats
•32 general-purpose registers
•1-cycle fetch/decode/execute (employs pipelining)
7
NIOS II processor core (Hamblen et al., chapters 15-17)
Hardware (IP) core—SOPC example; C/C++ compiler
•32-bit datapath
•1-6 pipeline stages
•32 general purpose registers, 6 special-purpose
•Optional instruction cache
•Optional multiply/divide instructions
•Hardware floating point unit can be added
•Hardware can be customized
•Development environment includes;
--C/C++ compiler
--Ability to customize library for the peripheral
devices you need
8
More about Altera devices and tools:
Generic FPGA architecture:
CARRY IN
GLOBAL BUS
IN
BUS
OUT
BUS
LOGICBUS
(LOOK-UP
TABLE or
LUT)
CLOCK
RESET
MEM IN
MEMORY
MEM OUT
(1-BIT)
LOCAL BUS
FPGA
(EXAMPLE)
RAM BLOCK
CARRY OUT
SINGLE FPGA CELL
9
examples (Altera, Xilinx, etc.):
"cell" typically contains LUT (look-up table),
memory, I/O
"address" from a,b,c output
ex: 3-input LUT:
inputs:
a
b
c
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
f0
f1
f2
f3
f4
f5
f6
f7
output
o
"device" consists of cells, local routing, global
routing, specialized memory arrays; manufacturer
provides "families" of devices--different sizes, power
usage, operating conditions, etc.
10
Example: using a lookup table to describe a gate network:
f(A,B,C) = A'B'C + A'BC' + A'BC + ABC
(001) (010) (011) (111)
Inputs: ABC
000
001
010
011
100
101
110
111
out
0
1
1
1
0
0
0
1
11
Example: A Generic Programmable Logic Device
Architecture
CARRY-IN
GLOBAL
LOCAL
BUS
BUS
OUT
IN
LOGIC
BUS
BUS
(LOOK-UP
TABLE)
LE
(Logic Element)
MEMORY
CLOCK
(1-BIT)
RESET
(Altera):
LAB
(Logic Array Block)
MEM IN
maxplus2 "compiler":
netlist extractor
MEM
OUT
RAM Block
b. BLOCK OF PLD CELLS
12
Device families:
Example: “Cyclone”—we will use EP1C6 or EP1C2
features:
» logic elements (LE’s)
» RAM blocks
» Global clock + Phase locked loops for clock configuration
» >= 170 I/O pins
Cyclone LE—figure 3.7
Cyclone LABs and interconnects: figure 3.9
(These references and those that follow are to the reference by Hamblen et al.)
13
"silicon compilation":
basic idea: restrict possible physical configurations; sacrifice area /
performance for "regularity" of design; use regular physical structures to
enable AUTOMATION of layout
All CAD tools will sacrifice some area/performance for automation and the
ability to do "large" designs, just as software compilers sacrifice some
efficiency for the ability to use a high-level language instead of assembly
language; designer productivity will increase substantially, however
SW Programming:
Write
Program
(HLL)
Link to
Libraries
Compile
Load/
Execute
Silicon Programming:
Write
Program
(HDL/Scm)
Compile/
Link
Fit
Simulate
Program
Device/
Execute
14
Altera Project Flow (“Rapid Prototyping”):
1. (Hierarchical) DESIGN
design entry
schematic (mydesign.gdf)
Verilog (mydesign.v)
other formats (VHDL, AHDL, EDIF, )
IP cores
2.Compilation
translation, optimization, synthesis (“netlist”)
device fitting (placement and routing)
Floorplan editor—figure 1.23
Report generation
3.”Execution”
Timing analysis
simulation (functional / timing)
device programming, hardware verification
information on power usage
15
After we have completed our design
(schematic or HDL), the compiler
converts it into a design for an actual
working circuit using a "technology
mapping": design library  technology
library
on the Altera boards we are using, there
is one chip, from the “CYCLONE” family
16
Technology: SRAM
General description:
http://en.wikipedia.org/wiki/Static_Random_Access_Mem
ory
General information on “programmable” devices:
http://www.tutorial-reports.com/computerscience/fpga/user-programmability.php
17
CYCLONE chips:
http://www.altera.com/literature/hb/cyc/cyc_c51002.pdf
2900-20,000 LE’s; 10 LE’s grouped into one LAB
Dedicated RAM blocks
Specs: page 2-2
LE architecture: page 2-6
Embedded memory specs; page 2-18
Global clock and up to 2 PLLs (UP3 clock: 48 MHz)
Device grades, operating conditions:
http://www.altera.com/literature/hb/cyc/cyc_c51004.pdf
18
Physical behavior of devices:
 operating conditions--recommended/absolute maximum
temperature
 gate delays
 pin voltage levels
 output loading
 power-supply management
 device programming erasure
 power evaluation: worksheet available at:
www.altera.com/support/devices/estimator/pow-powerplay.html
19
Reconfigurable computing:
We can create arrays of programmable devices
Programmability will allow us to change the
hardware capabilities "on the fly"--e.g., reprogram
some devices while others are being used for
processing; this allows us to "reconfigure" the
hardware to adapt to specific processing needs, just
as we can now rewrite software
example: Xilinx Virtex FPGA boards
20
Functional Testing: One more useful Altera option:
note that the devices we have access to will allow us to
produce fairly "large" designs. To adequately test these
designs, we will need to input files of test vectors rather than
relying solely on inputting waveforms (and we will need to do
HIERARCHICAL design AND testing)
A test vector file (myfile.vec) can be created in the text editor.
Here is an example file to test a module with inputs A, B,
RESET, and CLOCK and outputs X,Y,Z.
A
X
B
Y
RESET
Z
CLOCK
21
%test vector file for above module%
% units default to ns %
START 0 ; % time to start simulation%
STOP 1000 ; % time to end (in ns)%
INTERVAL 100 ;
INPUTS CLOCK ;
PATTERN
01;
% pattern of clock values %
% CLOCK ticks every 100 ns
%
INPUTS A B ;
PATTERN % test every combination of %
% A and B %
0> 0 0
220> 1 0
320> 1 1
% change A,B at given times
%
570> 0 1
720> 1 1
;
INPUTS RESET ;
PATTERN
0> 1
100> 0
;
OUTPUTS X Y Z ;
PATTERN % check output at
every Clock pulse --these are
expected values%
=XXX
=000
% relative time
vector values %
=000
=100
=001
=001
=011
=011
=111
=111
=111
22
=111;
using the .vec file: open the simulator; then on the "File"
menu choose inputs/outputs; then choose your .vec file;
you must do this BEFORE opening a .scf file
Note: results of the simulation cannot be saved as a .vec
file. To save your results, save them as either a
waveform (.vwf) or a table output (.tbl) file.
Alternative: compile separately in Verilog on Sun
workstations, compiler, use a testbench; then import into
Altera environment; this is the standard HDl methodology
23
Useful Altera functions:
•The UP3 core library
•input and output for the Altera board
•random number generation
24
We will be covering material from
chapters 4,5,10,11 on I/O
I/O is the "hardest" type of module to
build, since it requires transition between
electrical domain and other energy
domains (e.g., mechanical, light)
We will also discuss a way to generate
pseudo random numbers (Appendix A)
Both I/O and random number generation
will probably be useful for your projects.
25
UP3 functions:
an IP (intellectual property) core
described in chapter 5 of Hamblen et al.
can be used with schematics, Verilog, or VHDL
8 modules--perform I/O “housekeeping” functions
modules must be “visible” in your path or included in
your design in some way (directly, package, etc.)
26
module names:
 Debounce
 OnePulse
pushbuttons
 LCD_Display—LCD panel character display
 Clk_Div--gives slower clock speeds
 VGA_Sync—video sync generation
output
 Char_ROM--codes for display characters
 Keyboard--keyboard connector
 Mouse--mouse connector
input
27
parallel port
VGA port
UP3 BOARD
PS2 port
Cyclone chip
USB port
SRAM
serial port
FLASH
invalid input
voltage LED
on/off switch
user-definable
pushbuttons
user-definable
LEDs
power
user-definable
DIP switches
+3.3V
supply
LED
global reset
+5V
supply
LED
LC Display
http://users.ece.gatech.edu/~hamblen/UP3/ and
http://users.ece.gatech.edu/~hamblen/UP3/UP3%20Reference%20Manual.pdf
28
COMPONENT LCD_Display
PORT (Hex_Display_Data:
IN STD_LOGIC_VECTOR (Num_Hex_Digists*4)-1 DOWNTO 0;
reset, clock_48MHz: IN std_logic;
LCD_RS, LCD_E: OUT STD_LOGIC;
DATA_BUS: INOUT STD_LOGIC_VECTOR (7 DOWNTO 0);
END COMPONENT;
input 4 bits hex digit signal values to convert to ASCII hex digits and send to LED
display (note: Appendix D contains ASCII to hex table)
Num_Hex_Digits is a Generic parameter which can be given a value in a VHDL file
or in a schematic (16 characters, 2 lines available)
Outputs
LCD_RS
LCD_E
LCD_RW
DATA_BUS (7 DOWNTO 0):
PIN (important!)
108
50
73
113, 106, 104, 102, 100, 98, 96, 94
29
COMPONENT Debounce
PORT (pb, clk_100Hz:IN STD_LOGIC;
pb_debounced:OUT STD_LOGIC);
END COMPONENT;
pb is the input from a pushbutton (see I/O pins, chapter 2)
since pushbuttons have a mechanical “bounce”, this component samples the input
over several clock cycles and filters out the bounces; it will register the pushbutton
input only when several sequential samples of the input agree
the clock input is used by the bounce filter (see example below)
when “push” is registered, output goes low: it remains low until button is released
30
COMPONENT OnePulse
PORT (PB_debounced, clock:IN STD_LOGIC;
PB_single_pulse:OUT STD_LOGIC);
END COMPONENT;
after the push button signal is “debounced”, this component can be used to ensure
that the output read from the pushbutton is high for only one clock cycle, no matter
how long the pushbutton is held down
this is useful for building finite state machines--an edge-triggered flip-flop can be
used to build a state and each input will be active for only one clock cycle
the “clock” input is the clock signal being used to drive the state machine
31
COMPONENT Clk_Div
PORT (
clock_48MHz: IN STD_LOGIC;
clock_1MHz, clock_100KHz, clock_10KHz,
clock_1KHz, clock_100Hz, clock_10Hz, clock_1Hz:
OUT STD_LOGIC)
END COMPONENT;
the input is from the (48MHz) on-board clock (pin 29 for the Cyclone chip); JP3
jumper must be set to select the 48MHz USB—this the default setting
the outputs are clock signals of various frequencies which can be used in designs
Note: actual frequency will be
(listed frequency)*(1.007 +/- .005%)
32
Example:
pushbutton
fsm
Debounce
Clock
(pin
29)
OnePulse
Clock_100Hz
Clk_Div
Clock_1MHz
33
COMPONENT Mouse
PORT ( clock_48Mhz,reset: IN STD_LOGIC;
mouse_data, mouse_clk:INOUT STD_LOGIC;
left_button,right_button: OUT STD_LOGIC;
mouse_cursor_row,mouse_cursor_column:
OUT STD_LOGIC_VECTOR(9 DOWNTO 0);
END COMPONENT;
the input is from the (48MHz) on-board clock (pin 29 for the Cyclone chip);
mouse_data is pin 13, mouse_clk is pin 12: BIDIRECTIONAL
(also used for keyboard)
cursor outputs give postion in 640 x 480 pixel screen (VGA); cursor is initialized to
the middle of the screen
button outputs are high when the corresponding button is pushed
34
COMPONENT Keyboard
PORT
( keyboard_clk,keyboard_data, clock_48Mhz,
reset, read: IN STD_LOGIC;
scan_code: OUT STD_LOGIC_VECTOR(7 DOWNTO 0);
scan_ready: OUT STD_LOGIC);
END COMPONENT;
Reads PS/2 keyboard scan code; converts serial data from keyboard to parallel
clock input is from the (48MHz) on-board clock (pin 29 for the Cyclone chip);
keyboard_data is pin 13, keyboard_clk is pin 12: INPUTS
(also used for mouse)
read clears the scan_ready signal; reset clears flip-flops for serial-to-parallel conversion
scan_code: table of values in Table 11.3;
--”make” code: key is hit; “break” code: key is released
ex: ‘A’ make = 1C, break = F01C: ‘shift’ make = 12, break = F012
(if key is held down, several makes will be sent before a break)
scan_ready goes high when new scan code is sent and can be used to make sure each scan
35
code is read only once
COMPONENT VGA_Sync
PORT (clock_48MHz, red, green, blue: IN STD_LOGIC;
red_out, green_out, blue_out,
horiz_sync_out, vert_sync_out: OUT STD_LOGIC;
pixel_row, pixel_column: OUT STD_LOGIC_VECTOR(9 DOWNTO 0));
END COMPONENT;
clock_48MHz signal must come from pin 29 (Cyclone chip)
user logic generates the input “color” (red, green, blue)
Cyclone chip:
horiz_sync --> pin 226, vert_sync --> pin 227
red_out --> pin 228, green_out --> pin 122, blue_out --> pin 170
pixel_row and pixel_column give the pixel address
how many colors are available? how many pixels?
(“dithering”: one color on odd cycles, different on even  twice as many colors
example: pattern sent (even/odd cycles)
pattern observed
36
COMPONENT Char_ROM
PORT (clock: IN STD-logic;
character_address: IN STD_LOGIC_VECTOR (5 DOWNTO 0);
font_row, font_col: IN STD_LOGIC_VECTOR (2 DOWNTO 0);
row_mux_output: OUT STD_LOGIC);
END COMPONENT;
generates text for a video display--each character requires an 8 x 8 pixel pattern (see
codes, table 9.1--a memory initialization file, tcgrom.mif, is provided; the font data
can be stored in one M4K memory block)
character_address addresses the character to be displayed
font_row and font_col step through the 64 pixels (8x8) needed to display one
character
Clock loads the address register and should be tied to the video pixel_clock
row_mux_output is the pixel value to be output for this character at this position and
can be used to generate the correct RGB pixel color
37
How does output occur (examples: chapter 10):
monitor contains CRT (cathode ray tube)
screen consists of pixels, 640 in a row and 480 in a column (VGA format)
“refresh rate”: how quickly these pixels are scanned
640
standard rate is 60 times / second (60 Hz)
(human eye can detect “flicker” below 30Hz)
480
if there are 640 X 480 pixels, with a 60Hz refresh rate, how much time is available
to scan one pixel?
What clock speed is therefore required?
What is the onboard clock speed?
(note: UP3 has PLL which can be used to obtain faster refresh rates)
Sync signals tell when to start a new row or column
38
random number generation (Appendix A):
actually generates “pseudorandom” numbers
Q: what is the difference?
Method: example: n = 32--will give 32-bit pseudorandom sequence of bits
from table, read “XOR from bits 32,22,2,1” (bits are 32--1, not 31--0)
build a 32-bit shift register that shifts left one bit per cycle
next bit to be input into lsb should be the XOR of bits 32,22,2,1
this will generate a sequence in “pseudorandom order”
initial value in the register is the “seed”; 0 should not be used (why?)
39
Bit 8
Bit 1
Example: random number generator for n = 8:
8-bit shift register (shifts left)
Load with SEED which is any nonzero number
shift in XOR of the specified bits (8, 6, 5, 4 for n = 8)
Generate all 255 (28 – 1) nonzero numbers in “random” order, e.g.:
40
SEED=10101000 gives 10101000, 01010001, 10100011, 01000110, …
Example: n = 3--table gives bits 3,2
step
pattern (bit 3) xor (bit 2)
0
1
2
3
4
5
6
7
111
110
100
001
010
101
011
111
0
0
1
0
1
1
1
0---from here, the sequence will repeat
we have a sequence of the numbers 1-7: 7,6,4,1,2,5,3
this is the longest nonrepeating sequence we can have
order will always be the same, seed only determines where we start
(We MUST XOR specified bits for given value of n; it can be proved using facts
about polynomials over the field with elements 0,1 and addition = XOR these bits
will generate all nonzero values
41
How good are the random numbers generated?
Reference: Shruthi Narayanan, M.S. 2005, ATI Technologies
Hardware implementation of genetic algorithm modules for
intelligent systems:
Random numbers generated
by one shift register
Random numbers generated
by multiple shift registers
42