Intermediate Representations

Download Report

Transcript Intermediate Representations

Intermediate
Representations
CS 671
February 12, 2008
Last Time
The compiler must generate code to handle
issues that arise at run time
• Representation of various data types
• Procedure linkage
• Storage organization
1
CS 671 – Spring 2008
Activation Records
A procedure is a control abstraction
• it associates a name with a chunk of code
• that piece of code is regarded in terms of its
purpose and not of its implementation
A procedure creates its own name space
• It can declare local variables
• Local declarations may hide non-local ones
• Local names cannot be seen from outside
2
CS 671 – Spring 2008
Handling Control Abstractions
Generated code must be able to
• preserve current state
– save variables that cannot be saved in registers
– save specific register values
• establish procedure environment on entry
– map actual to formal parameters
– create storage for locals
• restore previous state on exit
This can be modeled with a stack
• Allocate a memory block for each activation
• Maintain a stack of such blocks
• This mechanism can handle recursion
3
CS 671 – Spring 2008
Activation Records
Keeps track of the bottom of
the current activation record
g(…)
{
f(a1,…,an);
fp
}
•g is the caller
•f is the callee
sp
4
CS 671 – Spring 2008
…
Arg n
…
Arg 2
Arg 1
Ret Address
Local vars
Temporaries
Saved regs
Arg m
…
Arg 1
Ret Address
g’s
frame
f’s
frame
next
frame
Activation Records
•Handling nested procedures
– We must keep a static link (vs. dynamic link)
•Registers vs. memory
– Assume infinite registers (for now)
g(…) {
f(a1,…,an);
}
If g saves before call, restores after call
caller-save register
If f saves before using a reg, restores after
callee-save register
5
CS 671 – Spring 2008
Intermediate Code Generation
Source code
Lexical Analysis
lexical
errors
Syntactic Analysis
syntax
errors
Semantic Analysis
semantic
errors
tokens
AST
AST’
Intermediate Code Gen
IR
6
CS 671 – Spring 2008
And Then? The Back End
IR
Trace Formation
Instruction Selection
Register Allocation
Optimizations *
7
CS 671 – Spring 2008
* phase
ordering
problem
IR Motivation
What we have so far...
• An abstract syntax tree
– With all the program information
– Known to be correct
• Well-typed
• Nothing missing
• No ambiguities
What we need...
• Something “Executable”
• Closer to actual machine level of abstraction
8
CS 671 – Spring 2008
Why an Intermediate Representation?
• What is the IR used for?
– Portability
– Optimization
– Component interface
– Program understanding
• Compiler
– Front end does lexical analysis, parsing, semantic
analysis, translation to IR
– Back end does optimization of IR, translation to
machine instructions
• Try to keep machine dependences out of IR for
as long as possible
9
CS 671 – Spring 2008
Intermediate Code
• Abstract machine code – (Intermediate
Representation)
• Allows machine-independent code generation,
optimization
optimize
AST
IR
x86
PowerPC
Alpha
10
CS 671 – Spring 2008
Intermediate Representations
• Any representation between the AST and ASM
– 3 address code: triples, quads (low-level)
– Expression trees (high-level)
MEM
+
a[i];
MEM
a
BINOP
MUL
i
CONST
W
11
CS 671 – Spring 2008
What Makes a Good IR?
• Easy to translate from AST
• Easy to translate to assembly
• Narrow interface: small number of node types
(instructions)
– Easy to optimize
– Easy to retarget
AST (>40 node types)
IR (~15 node types)
x86 (>200 opcodes)
12
CS 671 – Spring 2008
IR selection
Using an existing IR
• cost savings due to reuse
• it must be expressive and appropriate for the
compiler operations
Designing an IR
• decide how close to machine code it should be
• decide how expressive it should be
• decide its structure
• consider combining different kinds of IRs
13
CS 671 – Spring 2008
Optimizing Compilers
• Goal: get program closer to machine code without
losing information needed to do useful optimizations
• Need multiple IR stages
opt
optimize
AST
HIR
optimize
MIR
x86 (LIR)
opt
PowerPC (LIR)
opt
Alpha (LIR)
14
CS 671 – Spring 2008
High-Level IR (HIR)
• used early in the process
• usually converted to lower form later on
• Preserves high-level language constructs
– structured flow, variables, methods
• Allows high-level optimizations based on properties
of source language (e.g. inlining, reuse of constant
variables)
• Example: AST
15
CS 671 – Spring 2008
Medium-Level IR (MIR)
• Try to reflect the range of features in the source
language in a language-independent way
• Intermediate between AST and assembly
• Unstructured jumps, registers, memory locations
• Convenient for translation to high-quality machine
code
• OtherMIRs:
– tree IR: easy to generate, easy to do reasonable
instruction selection
– quadruples: a = b OP c (easy to optimize)
– stack machine based (like Java bytecode)
16
CS 671 – Spring 2008
Low-Level IR (LIR)
• Assembly code + extra pseudo instructions
• Machine dependent
• Translation to assembly code is trivial
• Allows optimization of code for low-level
considerations: scheduling, memory layout
17
CS 671 – Spring 2008
IR Classification: Level
for i := op1 to op2 step op3
instructions
endfor
High-level
i := op1
if step < 0 goto L2
L1: if i > op2 goto L3
instructions
i := i + step
goto L1
L2: if i < op2 goto L3
instructions
i := i + step
goto L2
L3:
Medium-level
18
CS 671 – Spring 2008
IR Classification: Structure
Graphical
• Trees, graphs
• Not easy to rearrange
• Large structures
Linear
• Looks like pseudocode
• Easy to rearrange
Hybrid
• Combine graphical and linear IRs
• Example:
– low-level linear IR for basic blocks, and
– graph to represent flow of control
19
CS 671 – Spring 2008
Graphical IRs
Abstract syntax tree
• High-level
• Useful for source-level information
• Retains syntactic structure
• Common uses
– source-to-source translation
– semantic analysis
– syntax-directed editors
20
CS 671 – Spring 2008
Graphical IRs
Tree, for basic block*
• root: operator
• up to two children: operands
• can be combined
Uses:
• algebraic simplifications
• may generate locally optimal code.
L1: i := 2
assgn, i
add, t1
if t2 goto L1
assgn, i
2
i
1
t1
0
2
*straight-line code with no branches or branch targets.
21
add, t1
gt, t2
t1:= i+1
t2 := t1>0
gt, t2
CS 671 – Spring 2008
1
0
Graphical IRs
Directed Acyclic Graphs (DAGs)
• Like compressed trees
– leaves: variables, constants available on entry
– internal nodes: operators
• annotated with variable names
– distinct left/right children
• Used for basic blocks (DAGs don't show control
flow)
• Can generate efficient code
– Note: DAGs encode common expressions
• But difficult to transform
• Good for analysis
22
CS 671 – Spring 2008
Graphical IRs
Control-Flow Graphs (CFGs)
• Each node corresponds to a
– basic block, or
– part of a basic block, or
• may need to determine facts at specific points
within BB
– a single statement
• more space and time
• Each edge represents flow of control
23
CS 671 – Spring 2008
Control-Flow Graphs
Graphical representation of runtime control-flow
paths
• Nodes of graph: basic blocks (straight-line computations)
• Edges of graph: flows of control
Useful for collecting information about computation
• Detect loops, remove redundant computations, register
allocation, instruction scheduling…
Alternative CFG: Each node contains a single stmt
i =0;
……
….
24
i=0
while (i < 50) {
t1 = b * 2;
a = a + t1;
i = i + 1;
}
if I < 50
t1 := b * 2;
a := a + t1;
i = i + 1;
CS 671 – Spring 2008
……
Graphical IRs: Often Use Basic Blocks
Basic block = a sequence of consecutive statements
in which flow of control enters at the beginning and
leaves at the end without halt or possibility of
branching except at the end
Partitioning a sequence of statements into BBs
1. Determine leaders (first statements of BBs)
– The first statement is a leader
– The target of a conditional is a leader
– A statement following a branch is a leader
2. For each leader, its basic block consists of the leader
and all the statements up to but not including the
next leader.
25
CS 671 – Spring 2008
Basic Blocks
unsigned int fibonacci (unsigned int n) {
unsigned int f0, f1, f2;
f0 = 0;
f1 = 1;
if (n <= 1)
return n;
for (int i=2; i<=n; i++) {
f2 = f0+f1;
f0 = f1;
f1 = f2;
}
return f2;
}
26
CS 671 – Spring 2008
read(n)
f0 := 0
f1 := 1
if n<=1 goto L0
i := 2
L2: if i<=n goto L1
return f2
L1: f2 := f0+f1
f0 := f1
f1 := f2
i := i+1
go to L2
L0: return n
Basic Blocks
entry
Leaders:
read(n)
f0 := 0
f1 := 1
if n<=1 goto L0
i := 2
L2: if i<=n goto L1
return f2
L1: f2 := f0+f1
f0 := f1
f1 := f2
i := i+1
go to L2
L0: return n
27
read(n)
f0 := 0
f1 := 1
n <= 1
return n
i := 2
i<=n
f2 := f0+f1
f0 := f1
f1 := f2
i := i+1
CS 671 – Spring 2008
return f2
exit
Graphical IRs
Dependence graphs : they represents
constraints on the sequencing of operations
• Dependence: a relation between two statements
•
•
that puts a constraint on their execution order
– Control dependence: Based on the program's
control flow
– Data dependence: Based on the flow of data
between statements
Nodes represent statements
Edges represent dependences
Built for specific optimizations, then discarded
28
CS 671 – Spring 2008
Graphical IRs
Dependence graphs
Example:
s1
s2
s3
s4
s5 L1:
s1
s2
true or flow dependence:
s2 uses a value defined in s1
This is read-after-write dependence
a := b + c
if a>10 goto L1
d := b * e
e := d + 1
d := e / 2
s3
s4
control dependence:
s3 and s4 are executed only when a<=10
antidependence:
s4 defines a value used in s3
This is write-after-read dependence
s5
output dependence:
s5 defines a value also defined in s3
This is write-after-write dependence
input dependence:
s5 uses a value also uses in s3
This is read-after-read situation. It places no constraints
in the execution order, but is used in some optimizations.
29
CS 671 – Spring 2008
IR Classification: Structure
Graphical
•
•
•
Trees, graphs
Not easy to rearrange
Large structures
Linear
• Seq of instructions that execute in order of appearance
• Control flow is represented by cond branches and jumps
Hybrid
•
•
30
Combine graphical and linear IRs
Example:
– low-level linear IR for basic blocks, and
– graph to represent flow of control
CS 671 – Spring 2008
Linear IR
Lower level IR before final code generation
• A linear sequence of instructions
• Resemble assembly code for an abstract machine
– Explicit conditional branches and goto jumps
• Reflect instruction sets of the target machine
– Stack-machine code and three-address code
• Implemented as a collection (table or list) of tuples
Stack-machine code
Push 2
Push y
Multiply
Push x
subtract
two-address code
MOV 2 => t1
MOV y => t2
MULT t2 => t1
MOV x => t4
SUB t1 => t4
Linear IR for x – 2 * y
31
CS 671 – Spring 2008
three-address code
t1
t2
t3
t4
t5
:=
:=
:=
:=
:=
2
y
t1*t2
x
t4-t3
Linear IRs
Stack machine code
• Assumes presence of operand stack
• Useful for stack architectures, JVM
• Operations typically pop operands and push results
• Advantages
– Easy code generation
– Compact form
• Disadvantages
– Difficult to rearrange
– Difficult to reuse expressions
32
CS 671 – Spring 2008
Stack-Machine Code
Also called one-address code
• Assumes an operand stack
• Operations take operands from the stack and push
results back onto the stack
Compact, simple to generate and execute
– Most operands do not need names
– Results are transitory unless explicitly moved to
memory
Used as IR for Smalltalk and Java
Stack-machine code for x – 2 * y
33
CS 671 – Spring 2008
Push 2
Push y
Multiply
Push x
subtract
Linear IR: Three-Address Code
Each instruction is of the form
•
•
x := y op z
y and z can be only registers or constants
Just like assembly
Common form of intermediate code
The AST expression x + y * z is translated as
t1 := y * z
t2 := x + t1
• Each subexpression has a “home”
34
CS 671 – Spring 2008
Three-Address Code
• Result, operand, operand, operator
– x := y op z, where op is a binary operator and x,
y, z can be variables, constants or compiler
generated temporaries (intermediate results)
• Can write this as a shorthand
– <op, arg1, arg2, result > -- quadruples
• Statements
– Assignment x := y op z
– Copy stmts x := y
– Goto L
35
CS 671 – Spring 2008
Three Address Code
• Statements (cont)
– if x relop y goto L
– Indexed assignments x := y[j] or s[j] := z
– Address and pointer assignments (for C)
• x := &y
• x := *p
• *x := y
– Parm x;
– call p, n;
– return y;
36
CS 671 – Spring 2008
Three-Address Code
Every instruction manipulates at most two
operands and one result.
• Arithmetic operations: x := y op z | x := op y
• Data movement: x := y [z] | x[z] := y | x := y
• Control flow: if y op z goto x |
goto x
• Function call: param x | return y | call foo
Reasonably compact; reuse of names and values
Three-address code for x – 2 * y
37
CS 671 – Spring 2008
t1
t2
t3
t4
t5
:=
:=
:=
:=
:=
2
y
t1*t2
x
t4-t3
Storing Three-Address Code
Store all instructions in a quadruple table
• Every instr has four fields: op, arg1, arg2, result
• Label of instructions  index of instruction in table
3AC
t1 := - c
t2 := b * t1
t3 := -c
t4 := b * t3
t5 := t2 + t4
a := t5
38
Quadruple entries
op
arg1
(0)
Uminus
c
(1)
Mult
b
(2)
Uminus
c
(3)
Mult
b
t3
t4
(4)
Plus
t2
t4
t5
(5)
Assign
t5
CS 671 – Spring 2008
arg2
result
t1
t1
t2
t3
a
Bigger Example
Consider the AST
t1 := - c
t2 := b * t1
t3 := - c
t4 := b * t3
t5 := t2 + t4
a := t5
39
CS 671 – Spring 2008
The IR Machine
A machine with
• Infinite number of temporaries (think registers)
• Simple instructions
– 3-operands
– Branching
– Calls with simple calling convention
• Simple code structure
– Array of instructions
• Labels to define targets of branches
40
CS 671 – Spring 2008
Temporaries
The machine has an infinite number of
temporaries
• Call them t0, t1, t2, ....
• Temporaries can hold values of any type
• The type of the temporary is derived from the
generation
• Temporaries go out of scope with each function
41
CS 671 – Spring 2008
Mapping Names to Variables
Variables are names for values
• Names given by programmers in the input program
• Names given by compilers for storing intermediate
results of computation
Reusing temp variables can save space, but mask
context and prevent analysis and optimizations
• Result of t1*t2 is no longer available after t1 is reused
Three-address code for x – 2 * y
t1
t2
t3
t4
t5
:=
:=
:=
:=
:=
t1
t2
t1
t2
t1
2
y
t1*t2
x
t4-t3
2
y
t1*t2
x
t2-t1
Different values reuse
the same name
Different values use
distinct names
42
:=
:=
:=
:=
:=
CS 671 – Spring 2008
Mapping Storage to Variables
Variables are placeholders for values
• Every variable must have location to store its value
– Register, stack, heap, static storage
• Values must be loaded into registers before use
x and y are in registers
Three-address code
for x – 2 * y:
Which vars can be
kept in registers?
Which vars must be
stored in memory?
43
x and y are in memory
t1 := 2
t2 := t1*y
t3 := x-t2
void A(int b, int *p)
{
int a, d;
a = 3; d = foo(a);
}
CS 671 – Spring 2008
t1
t2
t3
t4
t5
:=
:=
:=
:=
:=
2
y
t1*t2
x
t4-t3
*p =b+d;
Summary
• IRs provide the interface between the front
and back ends of the compiler
– Graphical, linear, hybrid
• Should be machine and language independent
• Should be amenable to optimization
44
CS 671 – Spring 2008