Using different models of computation to model microprocessors

Download Report

Transcript Using different models of computation to model microprocessors

Modeling CPU’s using Different
MOC’s: a Case Study
Trevor C. Meyerowitz
Advisor: Alberto Sangiovanni-Vincentelli
290n Final Presentation
May 15 2002
Outline





Introduction
 Motivation
 The Simple CPU to be modeled
 The Domains Investigated
Modeling a Non-Pipelined Processor
Modeling a Pipelined Processor
Demo
Conclusions
2
Motivation



Processor Designs are becoming much larger and more
complicated
 Many instructions in flight at a single time
 Strange Orderings, Speculation
 This can be very hard to verify
 We are developing a methodology to help alleviate
these problems.
Using Different Models of Computation can Potentially
Simplify the Design Task
PtolemyII Allows us to Compare a Variety of these MOC’s
in a Unified Framework
3
The Simple CPU


Processor Statistics
 Small Instruction Set
 ADD, SUB, ADDI, SUBI, and BNE
 Only Integer Operations
 128 registers, 128 entry instruction memory
This is enough to be interesting
 Data dependencies
 Control flow
4
The Domains Investigated

Process Networks
 Untimed Model
 Kahn-Macqueen
Semantics
 Infinite Queue’s
 Blocking Reads
 Fully Deterministic
 Schedule
Independent

Synchronous Reactive
 Untimed Model
 Instantaneous
Communication and
Computation
 Iterates Until a Fixed
Point is Found
 Signals must be
monotonic
5
The Nonpipelined Processor



Code and netlist reusued for both domains (I.e. these
are domain polymorphic actors)
Represented in PtolemyII as: Fetch, Regfile, Execute
and a Delay.
Fetch only after previous instruction has completed
6
Non-Pipelined Processor
Pseudocode (Fetch + Regfile)
public class Fetch … {
public class Reg … {
…
…
public fire() {
if (read_mode) {
inst = input_get_op_codes();
<rs_v, rt_v> = read_regs();
output_regs.send(0, inst);
output_regs.send(rs_v, rt_v);
}
else {
rd_v = input_get_write_vals();
write_values();
}
read_mode = !read_mode;
}
…
public fire() {
pc = input_pc.get(0);
<rs, rt, rd, val, inst> =
readIMEM(pc);
output_inst.send(0, inst);
output_regs.send(rs, rt); …
}
…
}
}
7
Non-Pipelined Processor
Pseudocode (Execute)
public class Exec … {
…
public fire() {
if (write_mode=false) {
reg_vals = input_reg_vals();
inst_type = read_inst();
results = exec_inst(inst_type, reg_vals);
}
else {
write_values(rd, results);
write_next_pc(results);
}
write_mode = !write_mode;
}
…
}
8
Non-Pipelined Processor:
Differences between Domains




SR required that we put the register read and
register write in different iterations as well as split
of execution and writing its results
Process networks cannot query port status
SR requires use of prefire and postfire conditions
We shared code between the two domains, SR
probably has more flexibility.
9
Pipelined Processor


Only required recoding of fetch behavior
 Fetch every “iteration”
 Only stall after branches (no branch prediction)
No forwarding logic is required!?
 This is because two register reads can’t occur without a
register write happening between them
 Due to PN deterministic requirement
 Also true because of SR because of states
 Probably could structure SR to require forwarding logic
(lower level of abstraction!!)
10
Pipelined Processor – Fetch
pseudo-code
public class Fetch … {
…
public fire() {
if (initial_firing ||
prev_inst_is_branch) {
pc = input_pc.get(0);
}
<rs, rt, rd, val, inst> =
readIMEM(pc);
output_inst.send(0, inst);
output_regs.send(rs, rt); …
pc = pc+1;
}
…
}
Causes you to stall
until the branch is
finished.
Immediately fires
again if there is no
branch!
11
Pipelining and Forwarding (t=0)
Program:
Inst.
id
Inst_1
Inst_2
Assembly
code
Logical
meaning
ADD R1,
R2, R3
R1 = R2 + R3
ADD R3,
R1, R1
R3 = R1 + R1
Fetch
Reg
Exec
File
Inst_2: R3 =
R1(?) + R1(?)
Inst_1: R1 =
R2(4) + R3(5)
Register File State:
R1 = 2
R2 = 4
R3 = 5
12
Pipelining and Forwarding (t=1)
Program:
Inst.
id
Inst_1
Inst_2
Assembly
code
Logical
meaning
ADD R1,
R2, R3
R1 = R2 + R3
ADD R3,
R1, R1
R3 = R1 + R1
Fetch
Reg
Exec
File
Register File State:The PN and SR models
don’t have this problem
R1 = 2
because they enforce
R2 = 4
the order: read inst_1,
R3 = 5
write inst_1, read inst_2
Inst_2: R3 =
R1(2) + R1(2)
Inst_1: R1(9)
= R2(4) +
R3(5)
This is an error!! It should read
R1 as 9. We can solve this by
adding forwarding logic, or
stalling the pipeline
13
Pipelined Processor with Branch
Prediction



Still in order, but branches are predicted instead of stalling.
Requires recoding of Fetch and the Register File
 Fetch
 Performs branch prediction
 Handles mispredicts
 Register File
 Keeps a queue of instructions
 Stall on dependencies
 Only write resolved instructions to regfile
This represents one refinement path
 Biased towards Process Networks
14
Program Code: Inst RD, RS, RT (Val)
ADD 5 5 5
ADD 6 5 5
BNE 5 20 -3
ADD 7 6 6
ADD 8 7 7
ADD 9 8 8
ADD 10 9 9
SUB 11 10 50
15
Outline





Introduction
Modeling a Non-Pipelined Processor
Modeling a Pipelined Processor
Demo
Conclusions
 Other Architectural Features
 Observations
 Future Work
16
Other Architectural Features
Out of Order Execution
 Requires “breaking” of PN model
 Superscalar execution
 Multiple fetches at once.. Might be
problematic to do in PN.
 Memory systems
 Initially simple, more complicated when
refinements are added.

17
Observations





Process Networks are relatively easy to use and
are quite predictable.
Process Networks are great for initial abstract
models.
Synchronous Reactive is simpler than DE to work
with, but more complicated to design than PN’s.
PN doesn’t deal well with ordering refinements,
but SR can handle them better.
We envision a methodology where you start with a
PN model and then move to an SR model.
18
Future Work
Look at implementing other architectural
features
 Examine relaxing PN’s requirements
 Look at domain specific actors
 Examine composing different MOC’s
 Introduce timing

19
20