4L Group

Transcript 4L Group

DMY
16-bit RISC
Microprocessor
Cecilia Florescu
Mojdeh Makabi
Daniel Yee
December 2, 2002
CS M152B
DMY
Overview


Purpose: Design a pipelined RISC
microprocessor
Design Platform: Xilinx ISE 4.1, ModelSim
5.6, Visual C++ 6.0,
Windows 2000 Professional
DMY
Pipelining
It acts like an assembly line
Ford’s Auto Assembly Line
Station 1
Station 2
Station 3
Station 4
Sequential Auto Production VS
Pipelining Auto Production
Auto Production
1
2
3
Auto
Production
4
1
1
2
3
4
1
2
3
4
Time
2
3
4
1
2
3
4
1
2
3
4
Time
DMY
Pipelined RISC
RISC is an acronym for Reduced Instruction Set Computer
 It has a reduced and simple instruction set
 It has a large number of general-purpose registers
In our Pipelined RISC Processor:
Each instruction takes 1 clock cycle for each stage
The processor can accept 1 new instruction per clock
Instructions are processed in stages as they pass down
Multiple instructions in some phase of execution
concurrently
 Pipelining doesn't improve the latency of instructions (each
instruction still requires the same amount of time to
complete)
 It does improve the overall throughput




DMY
Pipelined RISC Design
IF/ID
ID/EX
+
EX/MEM
MEM/WB
+
Control
Unit
PC
Memory
Instruction
Memory
Registers
ALU
Sign Exd
DMY
Instruction Fetch Stage
IF/ID
ID/EX
+
EX/MEM
MEM/WB
+
Control
Unit
PC
Memory
Instruction
Memory
Registers
ALU
Sign Exd
DMY
Instruction Decode Stage
IF/ID
ID/EX
+
EX/MEM
MEM/WB
+
Control
Unit
PC
Memory
Instruction
Memory
Registers
ALU
Sign Exd
DMY
Execution Stage
IF/ID
ID/EX
+
EX/MEM
MEM/WB
+
Control
Unit
PC
Memory
Instruction
Memory
Registers
ALU
Sign Exd
DMY
Memory Access Stage
IF/ID
ID/EX
+
EX/MEM
MEM/WB
+
Control
Unit
PC
Memory
Instruction
Memory
Registers
ALU
Sign Exd
DMY
Write Back Stage
IF/ID
ID/EX
+
EX/MEM
MEM/WB
+
Control
Unit
PC
Memory
Instruction
Memory
Registers
ALU
Sign Exd
DMY
Modified Pipelined RISC Design

16-bit ISA
• 16-bit fixed-length instructions, 16 registers
• no “funct” field for R-type, only “op” field
• limited number of operations
• 4-bit “opcode” field => maximum 16 operations
Suggested
R-type
R-type
I-type
J-type
3
3
3
3
4
opcode
rs
rt
rd
funct
4
4
4
4
opcode
rs
rt
rd
4
4
4
4
opcode
rs
rt
address
4
12
opcode
target address
DMY
Multiplier Algorithms

“Pencil-and-paper method”
10101
x
101
10101
101010
000000
+ 101010
11100111
0
1
0
0
• requires M cycles for one NxM multiplication
• implemented with AND, adder, and shift register
DMY
Multiplier Algorithms

Array Multiplier
DMY
Multiplier Algorithms

Modified Booth Encoding (MBE)
• reduces number of partial products by N/2 for MxN multiplication
• performs parallel encoding v. serial encoding in original Booth
Y2i + 1
Y2
Y2i - 1
Operation on X
0
0
0
0xX
0
0
1
+1 x X
0
1
0
+1 x X
0
1
1
+2 x X
1
0
0
-2 x X
1
0
1
-1 x X
1
1
0
-1 x X
1
1
1
0xX
DMY
Multiplier Algorithms

Wallace Tree
P0j P1j P2j
P3j P4j P5j
P6j P7j P8j
3-2
compressor
3-2
compressor
3-2
compressor
c3 j
c2 j
c3j-1 c2j-1
3-2
compressor
c5 j
c5j-1
c6 j
c1 j
c1j-1
3-2
compressor
c4 j
c4j-1
4-2
compressor
Carry[j]
c6j-1
Sum[j]
9-2 Compressor
• increases speed of summing by
increased parallelism
• all bits of PP in each column are
added independently and
simultaneously
• x-2 compressor composed of CSAs;
x := the number of PP’s in column
DMY
Multiplier Design

Issues and Solutions
• limited opcode size
• made NOP instruction ADD $0, $0, $0 => freed one opcode
• ADD instruction doesn’t change register $0 (constant zero value)
• latency v. simplicity
• multiplier lies in critical path; must calculate product in one cycle
• algorithms trade simplicity of control and/or wiring for faster speed
• multiplier latency not detrimental if n is small enough
• => 8x8 multiplier
• negative and positive integer multiplication
• 8 LSB of 16-bit operand taken as a two’s complement number
• sign detection unit detects signs operands and sets product sign
DMY
Exception Managing Hardware

Pipeline Modifications
• EPC register tracks the problematic instruction
• EPC_2 register to hold the instruction to return to, if allowed
• Expansion of control unit to detect overflow signal and handle exception
IF/ID
ID/EX
+
EX/MEM
+
MEM/WB
EPC
Control
Unit
Overflow
PC
Memory
Instruction
Memory
Registers
ALU
Subrt
Addr
Clk
EPC 2
Data Input
Sign Exd
DMY
Arithmetic Overflow Handler
Software Support
ALU performs
arithmetical
operations
•
Is Overflow
signal high?
YES
Control Unit has
been notified, and
takes corrective
action
Instruction in
MEM_WB latch
will continue
NO
Instruction
continues to MEM
stage
Assurance that MEM and WB stages of
pipeline continue execution
DMY
Arithmetic Overflow Handler
Software Support
ALU performs
arithmetical
operations
Is Overflow
signal high?
YES
Control Unit has
been notified, and
takes corrective
action
Instruction in
MEM_WB latch
will continue
Instructions in
IF_ID and ID_EXE
latches will be
flashed
NO
Instruction
continues to MEM
stage
•
Assurance that MEM and WB stages of
pipeline continue execution
•
Interruption of program
DMY
Arithmetic Overflow Handler
Software Support
ALU performs
arithmetical
operations
Is Overflow
signal high?
NO
YES
Control Unit has
been notified, and
takes corrective
action
Instruction in
MEM_WB latch
will continue
Instructions in
IF_ID and ID_EXE
latches will be
flashed
Content of EPC
will be stored in
R$15
Instruction
continues to MEM
stage
•
Assurance that MEM and WB stages of
pipeline continue execution
•
Interruption of program
•
Request to involve the operating
system
DMY
Arithmetic Overflow Handler
Software Support
ALU performs
arithmetical
operations
Is Overflow
signal high?
NO
Instruction
continues to MEM
stage
YES
Control Unit has
been notified, and
takes corrective
action
Instruction in
MEM_WB latch
will continue
Instructions in
IF_ID and ID_EXE
latches will be
flashed
Content of EPC
will be stored in
R$15
•
Assurance that MEM and WB stages of
pipeline continue execution
•
Interruption of program
•
Request to involve the operating
system
•
PC will jump to
overflow handling
subroutine
Enhancement of ISA
 “MFCO” - move from coprocessor
 “JR” - jump to address stored in
reserved register
DMY
Overflow Example
Instruction stored at address 103: 32 + 65527= 65559
Clock
-------------------------
Clock
Op A
32
0
-------------------------
xx
xx
Op B
65527
0
Op B
-------------------------
xx
xx
xx
23
ALU Out
-------------------------
xx
xx
xx
49183
104
105
00
11
00
ALU Out
Op A
Note:
0
Overflow
-------------------------
IF_Flash
IF_Flash
-------------------------
ID_Flash
ID_Flash
-------------------------
PC
103
PC Jump
•216 = 65536
•216 < 65559
Overflow
104
105
00
49152 49153
10
00
PC
------------------------PC Jump
-------------------------
DMY
Conclusion
16-bit processor, enhanced with a
multiplier and able to detect arithmetic
overflow
 Harvard Architecture model for memory
management
 14 multipurpose, 2 reserved registers
 Advantages and disadvantages of
designed 16-bit ISA

DMY
References














Boerger, Egon. Architecture Design and Validation Methods. New York Springer, 2000.
Carpinelli, John D. Computer Systems Organization and Architecture. Boston: Addison-Wesley,
2001.
Cohen, Ben. VHDL Coding Styles and Methodologies. Boston: Kluwer Academic Publishers,
1999.
Dahan, David. 17x17-Bit, High-Performance, Fully Synthesizable Multiplier. Technology
Licensing Division DSP Group Inc.
Ercegovac, Milos D., Thomas Lang, and Jaime H. Moreno. Introduction to Digital Systems. New
York: John Wiley & Sons, Inc., 1999.
Hennessy, John L. and David A. Patterson. Computer Organization and Design. 2nd ed. San
Francisco: Morgan Kaufmann Publishers Inc., 1997.
High Speed Parallel Multiplier For LEON Processor Algorithm.
Lab #5: Implementation of a Multiplier. EE116L course, UCLA.
Nahata, Sunny and Rohit Madampath. 8 by 8 bit High Speed Multiplier Design Using (4,2)
Counters. 2002.
Smith, James E. The Microarchitecture of Superscalar Processors. New York: Madison, 1995.
Stalling, William. Computer Organization and Architecture. 6th ed. Upper Saddle River:
Prentice Hall, 2003.
Sweetman, Dominic. See MIPS Run. San Francisco: Morgan Kaufmann Publishers Inc., 1999.
Tamir, Yuval. Computer Systems Architecture Notes. UCLA.
Yeh, Wen-Chang and Chein-Wei Jen. High-Speed Booth Encoded Parallel Multiplier Design.
IEEE Transactions on Computers, Vol. 49, No. 7. July 2000.

4L Group

Transcript 4L Group

Directory