Build GCC Cross Compiler for a Specify CPU

Download Report

Transcript Build GCC Cross Compiler for a Specify CPU

Build GCC Cross
Compiler for a Specify
CPU
Chia-Tsun Wu
D92943007
[email protected]
Outline
Introduction to SoC
 Motivation and project goal
 Design a CPU





Tools are used to design CPU hardware
CPU Specification
CPU Design flow
Simulation and Results
Outline

Build a GCC Cross Compiler







GCC structure
Knowledge to port GCC
Build Flow
Build a GCC Cross Assembler and Cross
Linker
Build a GCC Cross Compiler
A simple test program
Summary
Introduction to SoC
SoC: System on a Chip.
 Highly integrated include:

CPU
 System Bus
 Peripherals
 Co-processor
 …………


Low cost, low area, high performance.
What is SOC?
Portable / reusable IP
Embedded CPU
Embedded Memory
Real World Interfaces
(USB, PCI, Ethernet)
Software (both on-chip and off)
Mixed-signal Blocks
Programmable HW (FPGAs)
> 500K gates
SOC Design Flow
System Specs..
HW/SW
Partitioning
Hardware Descript.
HW Synth. and
Configuration
Configuration
Modules
Software Descript.
Software Gen.
& Parameterization
Interface Synthesis
Hardware
Components
HW/SW
Interfaces
Software
Modules
HW/SW Integration
and Cosimulation
Integrated
System
System Evaluation
Design Coverification
System Validation
Motivation and project goal

Motivation:




SoC is the major trend in recent years
CPU is one of the key kernel of SoC design
Development environment is the most important
to a CPU
Goal:



Design a simple 32-bit RISC CPU
Build a cross assembler and cross linker for a
specify CPU
Build a cross compiler for a specify CPU
Design a CPU

Specification







32-bit RISC based CPU
General-purpose register architecture
32-bit (64 Gbyte) addressing
32-bit fixed instruction length (excluding immediate data)
MSB first
Reset address 0x000ffffc
No pipeline, one instruction cycle four clock cycles






Instruction fetch
Instruction decode and Data fetch
Execution
Write back
No interrupt
No timer
Registers

General purpose register R0~R15
R13: Accumulator
 R14: memory data pointer
 R15: stack pointer

Program counter (PC) (0x000ffffc after
reset)
 Program status (PS) (Sign flag, Zero
flag, oVerflow flag, Carry flag)

Instruction formats

General: OP



OP: 8 bits
n: register number 0000: R0, 1111: R15
Immediate: OP




Rn1, Rn2
OP: 8 bits
n: register number 0000: R0, 1111: R15
#data:32 bit data
Branch: OP


#data, Rn2
Addr
OP: 16 bit (low byte=0x00)
Addr: 32 bits branch address
Instruction sets

ADD Rn1,Rn2



ADDC Rn1,Rn2



Rn2=Rn1+Rn2
Flag: SZVC
SUB Rn1,Rn2



Rn2=Rn1+Rn2
Flag: SZVC
Rn2=Rn2-Rn1
Flag: SZVC
SUBC Rn1,Rn2


Rn2=Rn2-Rn1
Flag: SZVC
Machine code:00000000Rn1Rn2
Machine code:00000001Rn1Rn2
Machine code:00000010Rn1Rn2
Machine code:00000011Rn1Rn2
Instruction sets

LDI #data,Rn2



MOV Rn1,Rn2



Rn2=Rn1
Flag:
RET



Rn2=data
Flag:
PC=[SP--]
Flag:
JMP #Addr


PC=[Addr]
Flag:
Machine code:00001000000Rn2#Data
Machine code:00000101Rn1Rn2
Machine code:0000011000000000
Machine code:0000011100000000#Addr
Tools are used
Synposis Design Compiler
 Mentor Graph ModelSim
 Synposis Apollo
 TSMC 0.25um standard cell libraries

Design Flow
CPU Specifications
RTL Coding
Test bench
Function simulation
Constrain
Design compiler
Test bench
Gate level simulation
Constrain
Apollo
Test bench
Post layout simulation
Tape out
Test vectors
LDI #0x0,R0
00000000000000000000010000000000 00000000000000000000000000000000
LDI #0x1,R1
00000000000000000000010000000001 00000000000000000000000000000001
LDI #0x2,R2
00000000000000000000010000000010 00000000000000000000000000000010
LDI #0x3,R3
00000000000000000000010000000011 00000000000000000000000000000011
LDI #0x4,R4
00000000000000000000010000000100 00000000000000000000000000000100
LDI #0x5,R5
00000000000000000000010000000101 00000000000000000000000000000101
LDI #0x6,R6
00000000000000000000010000000110 00000000000000000000000000000110
LDI #0x7,R7
00000000000000000000010000000111 00000000000000000000000000000111
LDI #0x8,R8
00000000000000000000010000001000 00000000000000000000000000001000
LDI #0x9,R9
00000000000000000000010000001001 00000000000000000000000000001001
LDI #0xa,R10
00000000000000000000010000001010 00000000000000000000000000001010
LDI #0xb,R11
00000000000000000000010000001011 00000000000000000000000000001011
LDI #0xc,R12
00000000000000000000010000001100 00000000000000000000000000001100
LDI #0xd,R13
00000000000000000000010000001101 00000000000000000000000000001101
LDI #0xe,R14
00000000000000000000010000001110 00000000000000000000000000001110
LDI #0xf,R15
00000000000000000000010000001111 00000000000000000000000000001111
ADD R0,R1
00000000000000000000000000000001
ADDC R2,R3
00000000000000000000000100100011
SUB R4,R5
00000000000000000000001001000101
SUBC R6,R7
00000000000000000000001101100111
MOV R8,R9
00000000000000000000010110001001
JMP 0x000000
00000000000000000000011100000000 00000000000000000000000000000000
Simulation result
Synthesis results
TSMC 0.25um
 Area:0.35mm*mm
 Clock:400MHz
 Power:1.73mW

UMC 0.18um
 Area:0.19mm*mm
 Clock:600MHz
 Power:1mW

Build a GCC Cross Compiler







GCC structure
Knowledge to port GCC
Build Flow
Build a GCC Cross Assembler and Cross
Linker
Build a GCC Cross Compiler
A simple test program
Summary
GCC Execution
gcc
cpp
cc1
g++
gas
(assembler)
Input file
output file
ld
(linker)
The Structure of Compiler
source program
lexical analyzer
syntax analyzer
Front-end
semantic analyzer
symbol-table
manager
intermediate code
generator
error
handler
code optimizer
Back-end
code generator
target program
The Structure of GCC
C
C++
ObjC
Parsing
TREE
RTL
Machine
Description
Macro
Definition
Global Optimizations
- Jump Optimization
- Common Subexpr. Elimination
- Loop Optimization
- Data Flow Analysis
Instruction Combining
Instruction Scheduling
Register Class Preferencing
Register Allocation
Peephole Optimizations
Assembly
Fortran
GCC Code Generation
Backend machine description pattern
match intermediate format (RTL).
 Machine description like a template.
 Machine description includes

type bit widths, memory alignment
 instruction patterns, register classes
 peephole optimization rules

GCC Code Generation (cont’d)
(set (reg:SF 12)
(minus:SF (reg:SF 13)
(reg:SF 14)))
Intermediate format (RTL)
(define_insn "subsf3"
[(set (match_operand:SF 0 "register_operand" "=f")
(minus:SF (match_operand:SF 1 "register_operand" "f")
(match_operand:SF 2 "register_operand" "f")))]
""
"subf\\t%0,%1,%2")
Machine description
subf
r1, r2, r3
Output assembly
Example of RTL
(plus:SI (reg:SI 8) (const_int 123))
Adds two 4-byte integer (SImode)
operands.
 First operand is register




Register is also 4-byte integer.
Register number is 8.
Second operand is constant integer.


Value is “123”.
Mode is VOIDmode (not given).
Templates

Used for three purposes:




Generating RTL from parse tree.
Generating machine insns from RTL.
Specifying parameters about instructions.
Sample Template for RISC machine:
(define_insn "addsi3"
[(set (match_operand:SI 0 "register_operand" "=r")
(plus:SI (match_operand:SI "register_operand" "%r")
(match_operand:SI 2 "register_operand" "r")))]
""
"add %0,%1,%2"
[(set_attr "type "arith")])
GCC Porting and Retargeting

Porting to new machines/processors



Using GCC as backend for other language




The “Using and Porting the GCC” book and
self-contained.
Done by describing machine, not how to
compile for machine.
Few well-documented.
Few examples.
See GNAT、GNU Cobol、Fortran porting.
In both case, copy from similar ports.
How to port GCC
In directory gccxxx/gcc/config/machine/
 machine.h



machine.md



Contain C macros that define general attributes of
the machine.
Contain RTL expressions that define the
instruction set.
Input to programs that procude .h and .c files.
machine.c

Machine-dependent functions; normally things too
large to cleanly put into above two files.
How to port GCC (cont’d)
Study the book
"Using and Poting GCC"
Study Target-machine
Specification
Find a approximate machine description
No find
Find
Create
target.h
target.c
target.md
Modify
target.h
target.c
target.md
Test
gcc/config
--Architecture characteristic key












H A hardware implementation does not exist.
M A hardware implementation is not currently being manufactured.
S A Free simulator does not exist.
L Integer registers are narrower than 32 bits.
Q Integer registers are at least 64 bits wide.
N Memory is not byte addressable, and/or bytes are not eight bits.
F Floating point arithmetic is not included in the instruction set
I Architecture does not use IEEE format floating point numbers
C Architecture does not have a single condition code register.
B Architecture has delay slots.
D Architecture has a stack that grows upward.
l Port cannot use ILP32 mode integer arithmetic.
gcc/config
--Architecture characteristic key














q Port can use LP64 mode integer arithmetic.
r Port can switch between ILP32 and LP64 at runtime. (Not necessarily
supported by all subtargets.)
c Port uses cc0.
p Port does not use define_peephole.
f Port does not define prologue and/or epilogue RTL expanders.
g Port does not define TARGET_ASM_FUNCTION_(PRO|EPI)LOGUE.
m Port does not use define_constants.
b Port does not use '"* ..."' notation for output template code.
d Port uses DFA scheduler descriptions.
h Port contains old scheduler descriptions.
a Port generates multiple inheritance thunks using
TARGET_ASM_OUTPUT_MI(_VCALL)_THUNK.
t All insns either produce exactly one assembly instruction, or trigger a
define_split.
e <arch>-elf is not a supported target.
s <arch>-elf is the correct target to use with the simulator in /cvs/src.
gcc/config
--Architecture characteristic key

Gcc-config.txt
define_peephole



In addition to instruction patterns the `md' file may
contain definitions of machine-specific peephole
optimizations.
The combiner does not notice certain peephole
optimizations when the data flow in the program
does not suggest that it should try them.
For example, sometimes two consecutive insns
related in purpose can be combined even though the
second one does not appear to use a register
computed in the first one. A machine-specific
peephole optimizer can detect such opportunities.
define_splits






Often you can rewrite the single insn as a list of individual insns,
each corresponding to one machine instruction.
The compiler splits the insn if there is a reason to believe that it
might improve instruction or delay slot scheduling.
Splits are evaluated after the combiner pass and before the
scheduling passes
Splits optimaized the speed and instruction length
they are the perfect place to put this intelligence.
Ex: If we are loading a small negative constant we can save
space and time by loading the positive value and then sign
extending it.
define_expand






On some target machines, some standard pattern names for
RTL generation cannot be handled with single insn, but a
sequence of RTL insns can represent them.
For these target machines, you can write a `define_expand' to
specify how to generate the sequence of RTL.
A `define_expand' is an RTL expression that looks almost like a
`define_insn'; but, unlike the latter, a `define_expand' is used
only for RTL generation and it can produce more than one RTL
insn.
The combiner pass only
cares about reducing the number of instructions
does not care about instruction lengths or speeds
define_insn

Push and pop



Move
















Addition



movqi_unsigned_register_load
movqi_signed_register_load
*movqi_internal
movhi
movhi_unsigned_register_load
movhi_signed_register_load
*movhi_internal
movsi
movsi_internal
movdi
*movdi_insn
movsf
*movsf_internal
*movsf_constant_storeSigned
conversions from a smaller integer to a larger
integer


movsi_push
movsi_popmove
extendqisi2
extendhisi2
zero_extendqisi2
zero_extendhisi2




Subtraction


add_to_stack
addsi3
addsi_regs
addsi_small_int
addsi_big_int
*addsi_for_reload
subsi3
Multiplication





mulsidi3
umulsidi3
mulhisi3
umulhisi3
mulsi3

Negation

Shifts




negsi2
ashlsi3
ashrsi3
lshrsi3
define_insn

Logical Operations





andsi3
iorsi3
xorsi3
one_cmplsi2

cmpsi
*cmpsi_internal
















beq
bne
blt
ble
bgt
bge
bltu
bleu
bgtu
bgeu
*branch_true
*branch_false





call
call_value
jump
indirect_jump
tablejump
Function Prologues and Epilogues

Branches

Calls & Jumps

Comparisons



prologue
epilogue
return_from_func
leave_func
enter_func
Miscellaneous


nop
blockage
define_insn “addsi_regs”











(define_insn "addsi_regs"
[(set (match_operand:SI 0 "register_operand"
"=r")
(plus:SI (match_operand:SI 1 "register_operand" "%0")
(match_operand:SI 2 "register_operand" "r")))]
""
"add
%2, %0"
)
;set value x
chapter 9.15 p110
; value=x
; (plus:m x y)
;
x+y with carry out in mode m
define_insn “addsi_regs” (cont’d)

















; (mach_operand:m n predicate constraint)
chapter 10.4 p131
;
if condition(predicate) is true then return n
;
n count from 0
;
for each number n, only one match_operand expression
;
predicate is a name of C function call. return 0 when failed
;
general_operand: check the operand is either a constant, a register, or a memory
reference
;
register_operand: check the operand is register or not
;
immediate_operand: check the operand is immediate data or not
;
constraint: describes one kind of operand that is permited
;
r: register
;
m: any kind of memory operand
;
o: only offsetable memory operand
;
V: only not offsetable memory operand
;
<: memory operand with autodecrement addressing
;
>: memory operand with autoincrement addressing
;
i: immediate integer operand
;
0~9: an operand that matches the specified operand number is allowed.
Build a GCC Cross Compiler
Machine Description
Configure GCC
Configure Binutils
Make
Make
Make install
Make install
GCC compiler
Build a GCC Cross Assembler and
Cross Linker

Binutils: Ver 2.14
Configure --target=fr30-elf –prefix=dir
 Make
 Make install

Build a GCC Cross Compiler

GCC: ver 3.3.1
../configure --target=fr30-elf --prefix=dir -enable-languages=c
 Make
 Make install

A simple c to test cross compiler
int test(int i,int j,int k)
{
int a;
int b;
a=49999999;
b=39999999;
a+=k;
b+=j;
a++;
b--;
i += a + b;
return i;
}

fr30-elf-gcc –S –O2 t.c
A simple c to test cross compiler
(cont’d)






test:
.file
"t.c"
.text
.p2align 2
.globl test
.type
test, @function

mov
ldi:32
r4, r2
#50000000, r4

ldi:32
#39999998, r1

add
add
add
add
ret
.size
.ident
r6,
r5,
r1,
r2,







r4
r1
r4
r4
;00000000000000000000010101000010
;00000000000000000000010000000100
;10111110101111000010000000
;00000000000000000000010000000001
;10011000100101100111111110
;00000000000000000000000001100100
;00000000000000000000000001010001
;00000000000000000000000000010100
;00000000000000000000000000100100
test, .-test
"GCC: (GNU) 3.3.1 (cygming special)"
A simple c to test cross compiler
(cont’d)
Summary







Study RTL is more important than study MD.
Build cross assembler and cross linker before build cross
compiler.
There are few data to port GCC as a cross compiler
Modify an existing MD is easier than to create a new one.
“The main goal of GCC was to make a good, fast compiler for
machines in the class that the GNU system aims to run on: 32bit machines that address 8-bit bytes and have several general
registers.” -- Richard Stallman.
It seems that to design a new CPU is easier than to build a
cross compiler for a GIEE studient.
http://gcc.gnu.org