CS 230 Chapter 3 Arithmetic for Computers

Download Report

Transcript CS 230 Chapter 3 Arithmetic for Computers

CS 230: Computer Organization
and Assembly Language
Aviral Shrivastava
Department of Computer Science and Engineering
School of Computing and Informatics
Arizona State University
Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary
Jane Irwin, PSU, Ande Carle, UCB
M
C L
Announcements
• Project 3
– MIPS Assembler
• Project 4
– MIPS Simulator
– Due Nov 10, 2009
• Quiz 4
– Nov 5, 2009
– Single-cycle implementation
• Finals
–
–
–
–
Tuesday, Dec 08, 2009
Please come on time (You’ll need all the time)
Open book, notes, and internet
No communication with any other human
M
C L
Single Cycle - Abstract View
• Abstract View
– elements that operate on data values (combinational)
– elements that contain state (sequential)
• Implementation
– Design the datapath
– Design the control
Instruction
Memory
PC
Address
Instruction
Write Data
Register Read
Data
Reg Addr
File
Reg Addr
Read
Data
Reg Addr
Address
ALU
Data
Memory Read Data
Write Data
M
C L
Single cycle Datapath
26
Shift
left 2
32
28
Jump
1
PC+4[31-28]
0
Add
4
Shift
left 2
RegWrite
Instruction
Memory
PC
Read
Address
Instruction
MemWrite
MemtoReg
Address
ALU
Data
Memory Read Data
Write Data
Data 2
Sign
16 Extend
PCSrc
ALUSrc ALU control
ovf
zero
Read Addr 1
Register Read
Read Addr 2 Data 1
File
Write Addr
Read
Write Data
Add
MemRead
32
M
C L
Single cycle Datapath + Control
Instr[25-0]
Shift
left 2
26
1
28
32
0
PC+4[31-28]
0
Add
Jump
ALUOp
Add
Shift
left 2
4
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Instr[31-26] Control
Unit
ALUSrc
RegWrite
RegDst
Instruction
Memory
PC
Read
Address
Instr[31-0]
ovf
Instr[25-21] Read Addr 1
Register Read
Instr[20-16] Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Instr[15-0]
Write Data
zero
ALU
Address
Data
Memory Read Data
1
Write Data
0
0
Data 2
1
Sign
16 Extend
32
Instr[5-0]
ALU
control
M
C L
Single cycle Control Unit
Instr
R-type
RegDst
ALUSrc
MemtoReg RegWr
MemRd
MemWr
Branch
1
0
0
0
1
X
X
ALUOp1 ALUOp0
1
X
0
0
1
X
1
1
1
0
0
0
0
1
X
0
X
1
0
0
0
0
X
0
X
0
1
X
1
000000
lw
100011
sw
101011
beq
000100
• Completely determined by the instruction opcode field
– Note that a multiplexor whose control input is 0 has a definite action,
even if it is not used in performing the operation
M
C L
Disadvantages of Single Cycle Implementation
• Uses the clock cycle inefficiently – the clock cycle
must be timed to accommodate the slowest
instruction
– especially problematic for more complex instructions like
floating point multiply
• Is wasteful of area since some functional units must
be duplicated since they can not be “shared” during
an instruction execution
– e.g., need separate adders to do PC update and branch
target address calculations, as well as an ALU to do Rtype arithmetic/logic operations and data memory
address calculations
M
C L
How to make it fast?
• Parallelism
• Short-cuts or Caching, or Bypassing
• Prediction
• Skip some work
• First form of parallelism is Pipelining
M
C L
Pipelining: Its Natural!
• Laundry Example
– Ann,
Brian,
Cathy,
Dave
each have one load of clothes
to wash, dry, and fold
A
B
C
D
• Washer takes 30 minutes
• Dryer takes 40 minutes
• “Folder” takes 20 minutes
M
C L
Sequential Laundry
6 PM
7
8
9
10
11 Midnight
Time
T
a
s
k
O
r
d
e
r
30 40 20 30 40 20 30 40 20 30 40 20
A
B
C
D
• Sequential laundry takes 6 hours for 4 loads
M
C L
Pipelined Laundry
6 PM
7
8
9
10
11 Midnight
Time
T
a
s
k
O
r
d
e
r
30 40 40 40 40 20
A
B
Note:
More time to do project 4
C
D
Pipelined laundry takes 3.5 hours for 4 loads
M
C L
Pipelining Lessons
6 PM
T
a
s
k
O
r
d
e
r
7
8
30 40 40 40
A
B
C
D
9
• Multiple tasks operating
simultaneously
Time
• Pipelining
doesn’t
help
40 20 latency of single task, it
helps throughput of entire
workload
• Pipeline rate limited by
slowest pipeline stage
• Potential speedup = Number
pipe stages
• Unbalanced lengths of pipe
stages reduces speedup
• Also, need time to “fill” and
“drain” the pipeline.
M
C L
Pipelining: Some terms
• If you’re doing laundry or implementing a mP, each
stage where something is done called a pipe stage
– In laundry example, washer, dryer, and folding table are
pipe stages; clothes enter at one end, exit other
– In a mP, instructions enter at one end and have been
executed when they leave
– Another example: auto assembly line
• Throughput is how often stuff comes out of a
pipeline
M
C L
Technical details
• If times for all S stages are equal to T:
– Time for one initiation to complete still ST
– Time between 2 initiates = T not ST
– Initiations per second = 1/T
• Pipelining:
sequence
Overlap multiple executions of same
– Improves THROUGHPUT, not the time to perform a single
operation
• Other examples:
– Automobile assembly plant, chemical factory, garden
hose, cooking
M
C L
More technical details
• Book’s approach to draw pipeline timing diagrams…
– Time runs left-to-right, in units of stage time
– Each “row” below corresponds to distinct initiation
– Boundary b/t 2 column entries: pipeline register
• (i.e. hamper)
– Must look at column contents to see what stage is doing what
0
1
Wash 1
2
3
4
5
6
Dry 1
Fold 1
Pack 1
Wash 2
Dry 2
Fold 2
Pack 2
Wash 3
Dry 3
Fold 3
Pack 3
Wash 4
Dry 4
Fold 4
Pack 4
Wash 5
Dry 5
Fold 5
Wash 6
Dry 6
Time for N initiations to complete: NT + (S-1)T
Throughput: Time per initiation = T + (S-1)T/N  T!
M
C L
Ideal pipeline speedup
Unpipelined
combinational
logic
delay = t
Latch
Pipelined
combinational
logic
delay = t
combinational
logic
delay = t
delay for 1 piece of data = 4t + latch setup (assume small)
approximate delay for 1000 pieces of data = 4000t
combinational
logic
delay = t
Latch
combinational
logic
delay = t
combinational
logic
delay = t
combinational
logic
delay = t
combinational
logic
delay = t
delay for 1 piece of data = 4(t + latch setup)
approximate delay for 1000 pieces of data = 3t + 1000t
4000
speedup for 1000 pieces of data = 1003 = ~ 4
Ideal speedup = # of pipeline stages
Latch
Latch
M
C L
The “new look” dataflow
4
IF/ID
ADD
ID/EX
EX/MEM
MEM/WB
M
u
x
PC
Comp.
IR6...10
Inst.
Memory
Data must be
stored from one
stage to the next
in pipeline
registers/latches.
hold temporary
values between
clocks and needed
info. for
execution.
M
u
x
IR11..15
MEM/
WB.IR
Register
File
M
u
x
Branch
taken
ALU
Data
Mem.
M
u
x
Sign
Extend
16
32
M
C L
Another way to look at it…
2
3
4
5
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
DM
Reg
Reg
DM
Reg
Reg
ALU
DM
Reg
ALU
Inst. i
1
ALU
Inst. #
ALU
Clock Number
DM
Inst. i+1
Inst. i+2
Program execution order (in instructions)
Inst. i+3
IM
Reg
IM
IM
IM
Reg
6
7
8
WB
Time
Reg
M
C L
Questions about control signals
• Following discussion relevant to a single instruction
• Q: Are all control signals active at the same time?
• Q: Can we generate all these signals at the same
time?
M
C L
Passing control w/pipe registers
• Analogy: send instruction with car on assembly line
– “Install Corinthian leather interior on car 6 @ stage 3”
strip off signals for
execution phase
strip off signals for
memory phase
Instruction
WB
IF/ID
Control
Generation
M
WB
EX
M
ID/EX
strip off signals for
write-back phase
WB
RegDst
Branch
MemtoReg
ALUOp
MemRead
RegWrite
ALUSrc
MemWrite
EX/MEM
MEM/WB
M
C L
Pipelined datapath w/control signals
PCSrc
ID/EX
Instruction
Memory
Read
reg 2
Read
data 1
Write
reg
Read
data 2
Zero
M
u
x
Write
data Registers
Inst[15-0]
WB
MemtoReg
M
Add
Shift
left 2
Read
reg 1
MEM/WB
MemWrite
RegWrite
ALUSrc
WB
RegDst
Read
address
ALUOp
EX
IF/ID
PC
M
Branch
Control
Add
EX/MEM
WB
MemRead
M
u
x
ALU
Read
addr
Write
addr
Write
data
Read
data
M
u
x
Data
Memory
ALU
control
Sign
extend
Inst[20-16]
Inst[15-11]
M
u
x
M
C L
A Pipelined Processor
• Pipeline latches: pass the status and result of the current instruction to
next stage
• Comparison:
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clock
Single-cycle
Ifetch Dec/Reg Exec
Mem
Wr
Ifetch Dec/Reg Exec
sw
lw
Ifetch Dec/Reg Exec
pipelined
Mem
Mem
Ifetch Dec/Reg Exec
Wr
Mem
Ifetch Dec/Reg Exec
Wr
Mem
Wr
M
C L
Yoda says…
Ohhh. Great warrior. Wars not make one great
M
C L