Transcript PowerPoint

Modern Compiler Internal
Representations
Silvius Rus
1/23/2002
Presentation Navigator






Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Traditional Compiler Organization

Pass: output type
–
–
–
–
–

Read code as text: ASCII characters
Lexical scanner: language words
Syntactic parser: language phrases
Translation: attribute grammar phrases
Output generated code: binary stream
Focus on pipelining due to memory window
constraints
Traditional Compiler Internal
Representation



Grammatical structure not always built
explicitly
Implicit, built-in semantics
Simple data structures:
–
–
Transition tables
Token streams and stacks
Presentation Navigator






Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Compiler Challenges

Versatile:
–
–

L+A < L*A
Generated efficient code:
–
–
–
–

Understand multiple languages
Generate output for various architectures
Fast: as fast as coded directly in the output language
Portable: runs on multiple platforms
Verifiable: runs provably within a specified class of behavior
Secure: provably respects certain security requirements
Extendable: need to extend in order to:
–
–
Incorporate new input language and/or target system
Take advantage of advances in run-time environments (such
as ISA changes, multithreading, distributed/parallel execution)
Understand Multiple Languages Output for Multiple Targets

Abstract IR:
–
–

Good points:
–

Same representation for Fortran, C, C++, Java, …
Possible only for conceptually similar languages
Perform complex transformations on a single representation
Bad points:
–
–
Language semantics may either get lost or need additional
particular representation
Specific architecture characteristics are more profitable to use
than common (abstractable) ones
Presentation Navigator






Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Staged Compilation

Stage 1:
–
–
–

Save/reload, pipe, HTTP, … text file
–

Load source file (text) into IR1 – machine independent
Optimize IR1
Stream IR1 to text file
SUIF files, Java bytecode, .NET assembly
Stage 2:
–
–
–
Load text file into IR2 – machine dependent
Perform machine specific optimization on IR2
Generate executable code or interpret IR2
Staged Compilation
Stage 1
Stage 2
Examples
Static
Static
SUIF, Promis
Static
Dynamic
SUN JIT, .NET JITer
Static
Static +
Dynamic
DyC, Quicksilver
Staged Compilation

Prepare IR1 so that stage 2 is very cheap
–
Quicksilver



–
Insert templated optimized object code in bytecode
Pack speculative optimization validation predicates in bytecode
Keep method dependence graphs explicitly in bytecode
Microsoft .NET


Explicit type/class information in IL
Preformatted, quickly accessible metadata
–
Strings, tables, heaps
– Custom data

Allow embedding of native code
Presentation Navigator






Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Generate Fast And Portable Code

Fast code
–
IR close to machine structure




Portable code
–

Mapping data to registers
Mapping operations to opcodes
Scheduling instructions for superscalar/VLIW processors
Machine description must be totally abstracted
QuickSilver: templated optimized code
Generate Verifiable Code

Microsoft .NET IL
–
–
Static and dynamic type safety - reflections
Managed code


–
Managed data


–
Carries a minimum of information on itself
Usually signed by compiler in Stage 1
Only accessible from managed code
Garbage collected
Managed pointers
Generate Secure Code

Hard to define limits
–
–
Make sure you run what you mean to
Limit rights




Per user
Per software component
QuickSilver: digests
.NET IL:
–
–
Code is signed using encrypting of hashed original
Permissions are set per module
Generate Efficient Code

IR may also provide support for:
–
–
Versioning (Quicksilver, .NET)
Culture (.NET)
Presentation Navigator






Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Compiler Internal Representation General Organization

High-level - completely machine independent
–
–
–
–
–

Medium-level - dependent on classes of machines
–

Abstract Syntax Tree
Control Flow Graph
Control Dependence Graph
Data Dependence Graph
Static Single Assignment
Virtual machine code, such as stack machine
Low level - dependent on particular ISA
–
Assembly, machine instruction graphs
Case Study: Polaris

High level representation
–
–
–
–
–

Abstract Syntax Tree
Control Flow Graph
Control Dependence Graph
Data Dependence Graph
Gated Static Single Assignment
Some generality
–
Backends for various parallel execution systems
Case Study: SUIF2

Multiple level representation
–
–
–
–




CFG, CDG, …
Quads
Machsuif
Custom annotations
Multiple frontends: Fortran, C, Java
Multiple backends: SUIF VM, C, assembly
Decoupled passes communicate only via SUIF
Extendable: OSUIF
Case Study: Promis


Switch to Promis organization presentation
Switch to Promis IR presentation
Case Study: KCC

Kook and Associates (KAI) C++ compiler:
–
C++ dedicated internal representation

–
Proprietary C++ specific object format


–
–
Advanced C++ specific optimization
Interprocedural optimization with modular compilation
C++ specific debug information – usable with KDB
Outputs C with calls to proprietary run-time library
Uses GNU gcc to generate machine code
Case Study: Jalapeno QuickSilver

Quasi-static images
–

Java bytecode + proprietary format
Representation allows for optimizations
–
–
–
Explicit method dependence graph
Templated optimized object code
Speculative optimization validation predicates
Case Study: .NET


Advertised 9 digit $$ figure project
CLI (ECMA standard)
–
Common type system

–
–
–
Type info in intermediate code
Common exception system

–
30+
languages
MSIL
Throw in Visual Basic, catch in C++
Support for security, culture, versioning
Support for charging per-use
Custom info can be passed for original
language specific description
native code
Other Compilers – Open Source

GNU compiler:
–
–
–
–

C, Fortran, Java, C++ front-ends
Generates code for all major architectures
Low level internal representation
New version (3.x) has SSA
SGI open source project: discontinued
Other Compilers – Commercial

Fortran, C, C++, Java produced by OS and/or
hardware producers
–

Other commercial compiler producers:
–

HP, SGI, Intel, Microsoft, SUN
Borland, Watcom, etc.
Internal representation – company secret
Presentation Navigator






Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Conclusions

Internal representation evolved
–
–
–
–

Programming paradigms
Changes in hardware
Changes in compiler/run-time system technology
New issues: security, verifiability, culture, versioning
Tendency: E Pluribus Unum