ELG6163_Proc_archite.. - School of Electrical Engineering and

Download Report

Transcript ELG6163_Proc_archite.. - School of Electrical Engineering and

Processor architectures
SYSC5603 (ELG6163) Digital Signal
Processing Microprocessors, Software
and Applications
Miodrag Bolic
Overview
• Introduction
–
–
–
–
Basic structure of a processor
Basic Operations
Pipelining
Registers
• Example design on an application-specific processor
• General purpose processors
– Example of FIR on a general purpose processor
– Datapath of a MIPS processor
What is “Computer Architecture”?
Application
Operating
System
Compiler
Firmware
Instr. Set Proc. I/O system
Datapath & Control
Digital Design
Circuit Design
Layout
• Coordination of many levels of abstraction
• Under a rapidly changing set of forces
Instruction Set
Architecture
Levels of abstraction
• Delving into the depths
reveals more information
• An abstraction omits unneeded detail,
helps us cope with complexity
High-level
language
program
(in C)
swap(int v[], int k)
{int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
C compiler
Assembly
language
program
(for MIPS)
swap:
muli $2, $5,4
add $2, $4,$2
lw $15, 0($2)
lw $16, 4($2)
sw $16, 0($2)
sw $15, 4($2)
jr $31
Assembler
Binary machine
language
program
(for MIPS)
00000000101000010000000000011000
00000000100011100001100000100001
10001100011000100000000000000000
10001100111100100000000000000100
10101100111100100000000000000000
10101100011000100000000000000100
00000011111000000000000000001000
Basic Structure of a Computer
References [Patterson04]
Basic Structure of a Computer (2)
•
Input Unit
– Keyboards, joysticks, trackballs, microphones and mice
•
Output Unit
– Printers and graphic displays
•
Memory Unit
– Primary (cache, RAM, HDD) and secondary (CD-ROM, tape drives)
•
Arithmetic and Logic Unit (ALU)
– Executions completed here and stored in fast-access registers
•
Control Unit (CU)
– Provides control to all other units, including timing signals
References [Patterson04]
Basic Operation of a Computer
1. The computer accepts information in the form of
programs and data through an input unit and
stores it in memory
2. The information stored in memory is fetched,
under program control, and processed in an ALU
3. The processed information leaves the computer
through an output unit
4. All activities inside the computer are directed by
the control unit
References [Patterson04]
Detailed Instruction Cycle
Copied from References [Patterson04]
Detailed Instruction Cycle (2)
•
Instruction address calculation
–
•
Instruction fetch
–
•
Fetches the operand from memory or read it from I/O
Data operation
–
•
Determines the address of the operand (if needed)
Operand fetch
–
•
Analyzes the instruction to determine the type of operation to be
performed and the operand(s) to be used
Operand address calculation
–
•
Reads the instruction from its memory location into the processor
Instruction operation decoding
–
•
Determines the address of the next instruction to be executed
Performs the operation indicated in the instruction
Operand store
–
Write the results into memory or out to I/O
References [Patterson04]
Fast, Pipelined Instruction Interpretation
Next Instruction
NI
Instruction Address
Instruction Fetch
Instruction Register
Decode &
Operand Fetch
Operand Registers
Execute
Result Registers
Store Results
Registers or Mem
Copied from References [Culler-Slides]
NI
IF
NI NI NI
IF IF IF IF
D D D D
E E E
W W
Time
D
E
W
E
W
W
Visualizing Pipelining
Time (clock cycles)
Reg
DMem
Ifetch
Reg
DMem
Reg
ALU
DMem
Reg
ALU
O
r
d
e
r
Ifetch
ALU
I
n
s
t
r.
ALU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Ifetch
Copied from References [Culler-Slides]
Ifetch
Reg
Reg
Reg
DMem
Reg
Terminology
• Performance - Time
– MIPS
– MFLOPS
– Cycles per Instruction (CPI)
• Architectures
–
–
–
–
–
RISC – Reduced Instruction Set Computer
CISC – Complex Instruction Set Computer
Scalar
Superscalar
Very-long instruction word
Comparison: CISC, RISC, VLIW
Copied from [Philips]
Sequential application specific processor
•
•
•
•
A processor tuned only for a particular application
Can be used for low-power implementations
Word lengths can be adjusted to the current problem.
Example: FIR filter
Direct form FIR filter
Copied from [Wanhammer99]
Transposed FIR
Copied from [Wanhammer99]
Assignment
• Design an N-tap transposed linear-phase FIR filter as a
sequential application specific processor. Use only one
multiplier and show how processing time can be
decreased twice.
Hint: design a transposed FIR filter structure as in the
previous slide but allow for generating the sums in
reversed order PSN-1, PSN-2, …, PS1, y(n).
Copied from [Wanhammer99]
General purpose processor architecture
• FIR example
• We will study RISC architectures
• Single-cycle processor
– Implementation of add and load instructions
• Pipelined implementation
– Why do all instructions have the same number of cycles
Example: Digital Filtering
• The basic FIR Filter equation is
Where h[k] is an array of constants
y[n]   h[k ].x[n  k ]
y[n]=0;
In C language
For (n=0; n<N;n++)
{
For (k = 0;k<N;k++)
//inner loop
y[n] = y[n] + h[k]*x[n-k];}
Only Multiply and
Accumulate
(MAC) is needed!
MAC using General Purpose Processor (GPP)
R0
11
12
3
11
X
R1
24
1
R2

44
9
2
3
Loop
Clr
A
;Clear Accumulator A
Clr
B
; Clear Accumulator B
Mov
*R0, Y0
; Move data from memory location 1 to register Y0
Mov
*R1,X0
; Move data from memory location 2 to register X0
Mpy
X0,Y0,A
;X0*Y0 ->A
Add
A,B
;A + B -> B
Inc
R0
;R0 + 1 -> R0
Inc
R1
;R1 + 1 -> R1
Dec
N
;Dec N (initially equals to 3)
Tst
N
;Test for the value
Jnz
Loop
;Different than zero loop again
Mov
B,*R2
;Move result to memory
The MIPS Instruction Formats
• All MIPS instructions are 32 bits long. The three
instruction formats are:
– R-type
31
26
op
rs
6 bits
31
– I-type
26
op
31
16
rt
5 bits
5 bits
21
rs
6 bits
– J-type
21
5 bits
11
6
rd
shamt
funct
5 bits
5 bits
6 bits
16
0
immediate
rt
5 bits
16 bits
26
op
6 bits
• The different fields are:
0
target address
26 bits
– op: operation of the instruction
– rs, rt, rd: the source and destination register
– shamt: shift amount
– funct: selects the variant of the operation in the “op” field
– address / immediate: address offset or immediate value
– target address: target address of the jump instruction
Copied from References [Shulte-Slides]
0
Translating MIPS Assembly into Machine Language
• Humans see instructions as words (assembly language), but
the computer sees them as ones and zeros (machine
language).
• An assembler translates from assembly language to
machine language.
• For example, the MIPS instruction add $t0, $s1, $s2 is
translated as follows
Assembly
Comment
add
$t0
$s1
$s2
op = 0, shamt = 0, funct = 32
rd = 8
rs = 17
rt = 18
000000
10001 10010
01000 00000
op
rs
rt
rd
Copied from References [Shulte-Slides]
shamt
100000
funct
MIPS Addressing Modes/Instruction Formats
• All MIPS instructions are 32 bits wide - fixed length
Register (direct)
op
rs
rt
add $s1, $s2, $s3
rd
register
Immediate
Base+index
op
rs
rt
immed
op
rs
rt
immed
register
addi $s1, $s2, 200
Memory
+
lw $s1, 200($s2)
PC-relative
op
rs
rt
PC
immed
Memory
+
beq $s1, $s2, 200
Copied from References [Shulte-Slides]
Architecture of the MIPS core
Clk
PC
Instruction address
Instruction
Instruction
Rt
Rs
Memory Rd
5
5
5
Rw
32
Ra
32 32-bit
registers
Clk
Rb
Imm
16
32
Data
32 address
Data in
32
Data
Memory
32
Clk
Copied from [Meerbergen-Slides]
Data out
Example 1 : R - type : add instruction
31
26
21
16
11
6
0
Op
rs
rt
rd
shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
Rd Rs
Reg Wr 5 5
add rd, rs, rt
• mem[PC]
• R[rd] = R[rs] + R[rt]
• PC = PC + 4
Rt
5
ALUctr
Rw Ra Rb
Bus W 32
BusA
32
32 Result
32 32-bit
registers
Clk
Copied from [Meerbergen-Slides]
BusB
32
Critical path R-type operation
Clk
PC
Instruction address
Instruction
Instruction
Rt
Rs
Memory Rd
5
5
5
Rw
32
Ra
Rb
Imm
16
32
Data
32 address
32 32-bit
registers
Data in
Data
Memory
Clk
32
Clk
Copied from [Meerbergen-Slides]
Data out
Critical path R-type operation
Clock
Clock-to-Q
PC
New value
Old value
Instruction memory access time
Rs, rt, rd
op, funct
New value
Old value
RFile access time
Bus A,B
New value
Old value
ALU delay
Bus W
Old value
New value
Set up + skew
Write into RFile
Copied from [Meerbergen-Slides]
Example 2 : I-type : load word
31
26
21
16
Op
rs
rt
6 bits 5 bits 5 bits
0
immediate
16 bits
Rd
Rt
RedDst
Rs dc (Rt)
Reg Wr
5
5
5
lw rs, rt, imm16
• mem[PC]
• addr = R[rs] + ext[imm16]
• R[rt] = mem[addr]
• PC = PC + 4
ALUctr
MemtoReg
BusA
Rw Ra Rb
32
Bus W 32
32 32-bit
registers
Clk
Imm 16
16
Extender
ExtOp
Copied from [Meerbergen-Slides]
32 Result
MemWr
BusB
32
32
ALUSrc
WrEn Adr
Data In
32
Clk
Data
Memory
Critical path load operation
Clock
Clock-to-Q
PC Old value
New value
Instruction memory access time
Rs, rt, rd
op, funct
New value
Old value
RFile access time
Bus A,B
New value
Old value
ALU delay
address
New value
Old value
Mem access time
Bus W
Old value
New value
set up+skew
Copied from [Meerbergen-Slides]
Architecture of the MIPS core
• problem : long critical path
defined by the slowest instruction (load)
• solution ?
= pipelining
• break the instruction into smaller steps
• all steps have about the same critical path
E.g. load
cycle 1
cycle 2 cycle 3
cycle 4 cycle 5
Ifetch
RF read
dmem RF write
ALU
5 stages
Copied from [Meerbergen-Slides]
Pipelining lw instructions
[Hennessy&Patterson]
lw
cycle 1
cycle 2 cycle 3
cycle 4 cycle 5 cycle 6 cycle 7
Ifetch
RF read
ALU
dmem RF write
Ifetch
RF read
ALU
Ifetch
RF read
lw
lw
dmem RF write
ALU
dmem RF write
• One instructions enters the pipeline every clock cycle
• One instructions leaves the pipeline every clock cycle
=> CPI = 1 (Cycles per Instruction)
Copied from [Meerbergen-Slides]
Pipelining lw instructions
I R A M W
Instructions
I
Data
R A M
I R A
I R
I
W
M
A
R
I
W
M W
A M W
R A M W
Current CPU cycle
Copied from [Meerbergen-Slides]
4 stages of R-type instruction
E.g. ADD
cycle 1
cycle 2 cycle 3
cycle 4
Ifetch
RF read
RF write
Copied from [Meerbergen-Slides]
ALU
Pipelining lw and R-type instructions
[Hennessy&Patterson]
lw
cycle 1
cycle 2 cycle 3
cycle 4 cycle 5 cycle 6 cycle 7
Ifetch
RF read
ALU
dmem RF write
Ifetch
RF read
add
ALU
RF write
Resource conflict
on the write port of the Rfile
Copied from [Meerbergen-Slides]
Solution: stretch R-type to 5 stages
Ifetch
RF read
ALU
dmem RF write
Dummy op (noop)
lw
cycle 1
cycle 2 cycle 3
cycle 4 cycle 5 cycle 6 cycle 7
Ifetch
RF read
ALU
dmem RF write
Ifetch
RF read
ALU
Ifetch
RF read
add
add
Copied from [Meerbergen-Slides]
dmem RF write
ALU
dmem RF write
Ifetch
mem
exec
Reg/dec
RegWr
wr
branch
Next PC
+4
Rfile
Rs
Rt
Prog
mem
BusA
Ra
flags
Rb
BusB
adr
ext.
Data
mem
Di Rw
Imm16
Rt
Rd
[Hennessy&Patterson]
Copied from [Meerbergen-Slides]
Din
RegDst
ALUSrc
ExtOp
ALUop
MemWr
Dout
MemtoReg
Data dependencies : R-type instructions
[Hennessy&Patterson]
R1 = ...
IM
… = R1 + ...
… = R1 + ...
DM
RF
IM
DM
RF
IM
… = R1 + ...
… = R1 + ...
Copied from [Meerbergen-Slides]
RF
DM
RF
IM
RF
DM
RF
IM
RF
RF
RF
DM
RF
References
[Culler-Slides] D. E. Culler, Computer Architecture, Lecture slide,
Computer Science at Berkeley.
[Hamacher01] C. Hamacher, Z. Vranesic, S. Zaky, Computer
Organization, McGraw-Hill Science/Engineering/Math; 5th edition,
August 2, 2001.
[Patterson04] D. A. Patterson, J. L. Hennessy, Computer Organization
and Design: The Hardware/Software Interface, Morgan Kaufmann;
3rd edition, August 2, 2004.
[Shulte-Slides] M. Schulte Computer Architecture ECE 201, Lecture
slides.
The other reference can be found at:
www.site.uottawa.ca/~mbolic/elg6131/References.htm