Using different models of computation to model microprocessors
Download
Report
Transcript Using different models of computation to model microprocessors
Modeling CPU’s using Different
MOC’s: a Case Study
Trevor C. Meyerowitz
Advisor: Alberto Sangiovanni-Vincentelli
290n Final Presentation
May 15 2002
Outline
Introduction
Motivation
The Simple CPU to be modeled
The Domains Investigated
Modeling a Non-Pipelined Processor
Modeling a Pipelined Processor
Demo
Conclusions
2
Motivation
Processor Designs are becoming much larger and more
complicated
Many instructions in flight at a single time
Strange Orderings, Speculation
This can be very hard to verify
We are developing a methodology to help alleviate
these problems.
Using Different Models of Computation can Potentially
Simplify the Design Task
PtolemyII Allows us to Compare a Variety of these MOC’s
in a Unified Framework
3
The Simple CPU
Processor Statistics
Small Instruction Set
ADD, SUB, ADDI, SUBI, and BNE
Only Integer Operations
128 registers, 128 entry instruction memory
This is enough to be interesting
Data dependencies
Control flow
4
The Domains Investigated
Process Networks
Untimed Model
Kahn-Macqueen
Semantics
Infinite Queue’s
Blocking Reads
Fully Deterministic
Schedule
Independent
Synchronous Reactive
Untimed Model
Instantaneous
Communication and
Computation
Iterates Until a Fixed
Point is Found
Signals must be
monotonic
5
The Nonpipelined Processor
Code and netlist reusued for both domains (I.e. these
are domain polymorphic actors)
Represented in PtolemyII as: Fetch, Regfile, Execute
and a Delay.
Fetch only after previous instruction has completed
6
Non-Pipelined Processor
Pseudocode (Fetch + Regfile)
public class Fetch … {
public class Reg … {
…
…
public fire() {
if (read_mode) {
inst = input_get_op_codes();
<rs_v, rt_v> = read_regs();
output_regs.send(0, inst);
output_regs.send(rs_v, rt_v);
}
else {
rd_v = input_get_write_vals();
write_values();
}
read_mode = !read_mode;
}
…
public fire() {
pc = input_pc.get(0);
<rs, rt, rd, val, inst> =
readIMEM(pc);
output_inst.send(0, inst);
output_regs.send(rs, rt); …
}
…
}
}
7
Non-Pipelined Processor
Pseudocode (Execute)
public class Exec … {
…
public fire() {
if (write_mode=false) {
reg_vals = input_reg_vals();
inst_type = read_inst();
results = exec_inst(inst_type, reg_vals);
}
else {
write_values(rd, results);
write_next_pc(results);
}
write_mode = !write_mode;
}
…
}
8
Non-Pipelined Processor:
Differences between Domains
SR required that we put the register read and
register write in different iterations as well as split
of execution and writing its results
Process networks cannot query port status
SR requires use of prefire and postfire conditions
We shared code between the two domains, SR
probably has more flexibility.
9
Pipelined Processor
Only required recoding of fetch behavior
Fetch every “iteration”
Only stall after branches (no branch prediction)
No forwarding logic is required!?
This is because two register reads can’t occur without a
register write happening between them
Due to PN deterministic requirement
Also true because of SR because of states
Probably could structure SR to require forwarding logic
(lower level of abstraction!!)
10
Pipelined Processor – Fetch
pseudo-code
public class Fetch … {
…
public fire() {
if (initial_firing ||
prev_inst_is_branch) {
pc = input_pc.get(0);
}
<rs, rt, rd, val, inst> =
readIMEM(pc);
output_inst.send(0, inst);
output_regs.send(rs, rt); …
pc = pc+1;
}
…
}
Causes you to stall
until the branch is
finished.
Immediately fires
again if there is no
branch!
11
Pipelining and Forwarding (t=0)
Program:
Inst.
id
Inst_1
Inst_2
Assembly
code
Logical
meaning
ADD R1,
R2, R3
R1 = R2 + R3
ADD R3,
R1, R1
R3 = R1 + R1
Fetch
Reg
Exec
File
Inst_2: R3 =
R1(?) + R1(?)
Inst_1: R1 =
R2(4) + R3(5)
Register File State:
R1 = 2
R2 = 4
R3 = 5
12
Pipelining and Forwarding (t=1)
Program:
Inst.
id
Inst_1
Inst_2
Assembly
code
Logical
meaning
ADD R1,
R2, R3
R1 = R2 + R3
ADD R3,
R1, R1
R3 = R1 + R1
Fetch
Reg
Exec
File
Register File State:The PN and SR models
don’t have this problem
R1 = 2
because they enforce
R2 = 4
the order: read inst_1,
R3 = 5
write inst_1, read inst_2
Inst_2: R3 =
R1(2) + R1(2)
Inst_1: R1(9)
= R2(4) +
R3(5)
This is an error!! It should read
R1 as 9. We can solve this by
adding forwarding logic, or
stalling the pipeline
13
Pipelined Processor with Branch
Prediction
Still in order, but branches are predicted instead of stalling.
Requires recoding of Fetch and the Register File
Fetch
Performs branch prediction
Handles mispredicts
Register File
Keeps a queue of instructions
Stall on dependencies
Only write resolved instructions to regfile
This represents one refinement path
Biased towards Process Networks
14
Program Code: Inst RD, RS, RT (Val)
ADD 5 5 5
ADD 6 5 5
BNE 5 20 -3
ADD 7 6 6
ADD 8 7 7
ADD 9 8 8
ADD 10 9 9
SUB 11 10 50
15
Outline
Introduction
Modeling a Non-Pipelined Processor
Modeling a Pipelined Processor
Demo
Conclusions
Other Architectural Features
Observations
Future Work
16
Other Architectural Features
Out of Order Execution
Requires “breaking” of PN model
Superscalar execution
Multiple fetches at once.. Might be
problematic to do in PN.
Memory systems
Initially simple, more complicated when
refinements are added.
17
Observations
Process Networks are relatively easy to use and
are quite predictable.
Process Networks are great for initial abstract
models.
Synchronous Reactive is simpler than DE to work
with, but more complicated to design than PN’s.
PN doesn’t deal well with ordering refinements,
but SR can handle them better.
We envision a methodology where you start with a
PN model and then move to an SR model.
18
Future Work
Look at implementing other architectural
features
Examine relaxing PN’s requirements
Look at domain specific actors
Examine composing different MOC’s
Introduce timing
19
20