Transcript here

Adding custom instructions to
Simplescalar/GCC architecture
Somasundaram
Agenda
Motivation
 GCC overall architecture
 Simplescalar architecture
 Adding a custom instruction
 Conclusion

Motivation

Extensible processors
• What regular ISA instructions can be
combined?
• Which regular ISA instructions are to be
combined into a CFU instruction?
• Retarget the compiler to produce
optimised code with CFU instructions
• Simulate the extended processor with
CFU instructions
GNU Compiler Collection

Many front-ends
•C
• Fortran
• C++/Java/Ada

Backend targeted at many
processors
• x86, Alpha, Sparc
• ARC, ARM, MIPS . . .
GCC Compiler Flow
Program
Language Front-end
GIMPLE IR
RTL?
High-level IR
Optimisations
Combine small RISC ISA like
patterns into bigger CISC ISA
Are we
RTL IR
like patterns
interested in
everything?
Low-level IR Optimisations
Scheduled assembly
code
Machine dependent files
[.c, .h,.md]
GCC – Low Level Optimisation

Uses Lisp like RTL as IR
Example:
Tip: use –da compiler option to get the IR output
(insn 48 47 50 (set (reg/v:SI 36)
(mult:SI (reg:SI 42)
(reg:SI 41))) 41 {mulsi3} (nil)
(nil))
(call_insn 94 93 97 (parallel[
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("printf")) 0)
(const_int 0 [0x0])))
(clobber (reg:SI 14 lr))
] ) -1 (nil)
(nil)
(expr_list (use (reg:SI 1 r1))
(expr_list (use (reg:SI 0 r0))
(nil))))
GCC - Target Machine Description

Use a similar language in md [machine
description] file
(define_insn "mulsi3"
[(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
(mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
(match_operand:SI 1 "s_register_operand" "%?r,0")))]
""
"mul%?\\t%0, %2, %1"
[(set_attr "type" "mult")])
GCC Combine Phase
Combines some standard IR pattern
into a single user-defined IR pattern
 User-defined IR patterns are defined
in the target.md file

Operand constraints should be
satisfied
 Example: MAC (Multiply-Accumulate)

Merge mulsi3 and addsi3  mulsi3addsi
GCC Combine Phase

How is it done?
Let us assume that the following patterns
are defined in the machine description
addsi3  Matches C=A+B (all 32-bit regs)
mulsi3  Matches C=A*B (all 32-bit regs)
mulsi3addsi  Matches D=A*B+C (all 32-bit regs)
mulsi4addsi  Matches E=A*B+C*D (all 32-bit regs)
GCC Combine Phase
Assume this DDG sub-graph
mem
mem
mem
mem
47
45
52
50
48
53
mulsi3
mulsi3
55
addsi3
GCC Combine Phase
mem
mem
mem
mem
47
45
52
50
48
53
mulsi3
mulsi3
55
addsi3
Try to combine 48, 55 and see if a
pattern which multiplies two
operands and adds a third operand
to the result exists
mem
mem
mem
mem
47
45
52
50
Try 55,45:No matching pattern
Try 55,47:No matching pattern
53
mulsi3
55
mulsi3addsi
Try 55,53:We have a match
GCC Combine phase
mem
mem
mem
mem
47
45
52
50
Try 55,52:No matching pattern
Try 55,50:No matching pattern
55
mulsi4addsi
Try 55,45:No matching pattern
Try 55,47:No matching pattern
Try 55,52,50: No matching pattern
Try 55,52,45: No matching pattern
Try 55,52,47: No matching pattern
Try 55,50,45: No matching pattern
Try 55,50,47: No matching pattern
Try 55,47,45: No matching pattern
Cannot try to combine more than 3
patterns! Hence, stop!
GCC Combine phase: Summary
Can combine upto 3 instructions
together
 Can recursively combine more
instructions
 Deletes a smaller instruction once
combined
 Always works on a function

Retargetting GCC for CFU

Build a better Combiner phase
• Write a new combiner with better
pattern merger which works on inputs
from RTL
• Replace existing combiner with this
combiner
New patterns for the CFU instruction
in the target.md file
 Changes in GAS (included in binutils
package) to generate insn. word

SimpleScalar is
Instruction Set simulator
 Profiles programs
 Simulates micro-architectural
features
 Different levels of speed of
simulation Vs accuracy trade-off
 Written in C
 Easily retargettable

Simplescalar: CFU issues

More arguments than used by RISC
instructions
• Out-of-order execution needs to take
care of the increase in dependencies

New instructions in decode tree
• Easy to add new instructions to the
decode tree (machine.def)
Let us add a new instruction
Achieve the operation E=A*B+C*D
using one instruction
 4 input operands and 1 output
operand
 Extension to ARM ISA
 Provide

• Compiler
• Assembler
• Simulator
Pattern for the instruction

gcc/config/arm/arm.md
(define_insn "*mulsi4addsi"
[(set (match_operand:SI 0 "s_register_operand" "=r")
(plus:SI
(mult:SI (match_operand:SI 2 "s_register_operand" "r")
(match_operand:SI 1 "s_register_operand" "r"))
(mult:SI (match_operand:SI 4 "s_register_operand" "r")
(match_operand:SI 3 "s_register_operand" "r"))))]
""
"ml2a%?\\t%0, %2, %1, %4, %3"
[(set_attr "type" "mult")])
Simplescalar changes

Instruction Decode Tree
• Chain of decoders: Each looking at a set
of bits

target-arm/arm.def
• New chain of decoder macros for CFU
class of instructions
• Increase the number of input
dependencies in all the instructio
macros from 5 to 6 (predication in ARM)
Simplescalar changes

sim-outorder.c
• Increase the number of input
dependencies to be monitored in the
reservation unit
• Both macros and code has to be
changed


Other files need to be changed for
the same purpose
Compile ‘test program’ and verify!
Summary
Identify the ways to add new
instructions to Simplescalar and GCC
 Determine the capabilities of the
current combiner in GCC
 Demonstrate the addition of a new
custom instruction
 Understand GCC to some extent!
