Multithreading 1

Download Report

Transcript Multithreading 1

COMP25212
CPU Multi Threading
• Learning Outcomes: to be able to:
– Describe the motivation for multithread support in CPU
hardware
– To distinguish the benefits and implementations of coarse
grain, fine grain and simultaneous multithreading
– To explain when multithreading is inappropriate
– To be able to describe a multithreading implementations
– To be able to estimate performance of these
implementations
– To be able to state important assumptions of this
performance model
Revision: Increasing
CPU Performance
Inst Cache
Data Cache
c
Clock
a
How can throughput be increased?
Write Logic
Fetch
Mem Logic
Logic
d
Fetch
Exec Logic
Logic
b
Decode
Fetch Logic
Logic
e
Fetch Logic
f
Increasing CPU Performance
a)
b)
c)
d)
e)
f)
By increasing clock frequency
By increasing Instructions per Clock
Minimizing memory access impact – data cache
Maximising Inst issue rate – branch prediction
Maximising Inst issue rate – superscalar
Maximising pipeline utilisation – avoid
instruction dependencies – out of order execution
g) (What does lengthening pipeline do?)
Increasing Program Parellelism
–
–
–
–
Keep issuing instructions after branch?
Keep processing instructions after cache miss?
Process instructions in parallel?
Write register while previous write pending?
• Where can we find additional independent
instructions?
– In a different program!
Revision – Process States
New
Terminated
Needs to wait
(e.g. I/O)
Running
on a CPU
Dispatch
(scheduler)
Pre-empted
(e.g. timer)
Ready
waiting for
a CPU
Blocked
waiting for
event
I/O occurs
Revision – Process Control Block
•
•
•
•
•
•
Process ID
Process State
PC
Stack Pointer
General Registers
Memory Management
Info
• Open File List, with
positions
• Network Connections
• CPU time used
• Parent Process ID
Revision: CPU Switch
Process P0
Operating System
Save state into PCB0
Load state fromPCB1
Save state into PCB0
Load state fromPCB1
Process P1
What does CPU load on
dispatch?
•
•
•
•
•
•
Process ID
Process State
PC
Stack Pointer
General Registers
Memory Management
Info
• Open File List, with
positions
• Network Connections
• CPU time used
• Parent Process ID
What does CPU need
to store on deschedule?
•
•
•
•
•
•
Process ID
Process State
PC
Stack Pointer
General Registers
Memory Management
Info
• Open File List, with
positions
• Network Connections
• CPU time used
• Parent Process ID
CPU Support for Multithreading
VA MappingA
Address
Translation
VA MappingB
Inst Cache
Data Cache
PCA
Write Logic
Fetch
Mem Logic
Logic
Fetch
Exec Logic
Logic
GPRsB
Decode
Fetch Logic
Logic
GPRsA
Fetch Logic
PCB
How Should OS View Extra
Hardware Thread?
• A variety of solutions
• Simplest is probably to declare extra CPU
• Need multiprocessor-aware OS
CPU Support for Multithreading
VA MappingA
Address
Translation
VA MappingB
Inst Cache
Write Logic
Fetch
Mem Logic
Logic
GPRsB
Fetch
Exec Logic
Logic
GPRsA
Fetch Logic
PCB
Decode
Fetch Logic
Logic
PCA
Design Issue:
when to switch
threads
Data Cache
Coarse-Grain Multithreading
• Switch Thread on “expensive” operation:
– E.g. I-cache miss
– E.g. D-cache miss
• Some are easier than others!
Switch Threads on Icache miss
Inst a
Inst b
Inst c
Inst X
d
Inst Ye
Inst Zf
1
2
3
4
5
6
7
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF MISS
ID
-
EX
-
MEM
-
WB
-
IF
ID
EX
MEM
IF
ID
EX
IF
ID
Performance of Coarse Grain
• Assume (conservatively)
– 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks)
– 1 i-cache miss per 100 instructions
– 1 instruction per clock otherwise
• Then, time to execute 100 instructions without
multithreading
– 100 + 20 clock cycles
– Inst per Clock = 100 / 120 = 0.83.
• With multithreading: time to exec 100 instructions:
– 100 [+ 1]
– Inst per Clock = 100 / 101 = 0.99..
Switch Threads on Dcache miss
Inst a
Inst b
Inst c
Inst d
Inst X
e
1
2
3
4
5
6
7
IF
ID
EX
M-Miss
WB
MISS
MISS
MISS
IF
ID
EX
MEM
-
WB
-
-
IF
ID
EX
-
IF
ID
IF
Inst Yf
Abort
MEM
these
WB
-
EX
-
MEM
-
ID
EX
IF
ID
Performance:
similar calculation (STATE ASSUMPTIONS!)
Where to restart after memory cycle? I suggest instruction “a” –
why?