Transcript PowerPoint
Modern Compiler Internal
Representations
Silvius Rus
1/23/2002
Presentation Navigator
Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Traditional Compiler Organization
Pass: output type
–
–
–
–
–
Read code as text: ASCII characters
Lexical scanner: language words
Syntactic parser: language phrases
Translation: attribute grammar phrases
Output generated code: binary stream
Focus on pipelining due to memory window
constraints
Traditional Compiler Internal
Representation
Grammatical structure not always built
explicitly
Implicit, built-in semantics
Simple data structures:
–
–
Transition tables
Token streams and stacks
Presentation Navigator
Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Compiler Challenges
Versatile:
–
–
L+A < L*A
Generated efficient code:
–
–
–
–
Understand multiple languages
Generate output for various architectures
Fast: as fast as coded directly in the output language
Portable: runs on multiple platforms
Verifiable: runs provably within a specified class of behavior
Secure: provably respects certain security requirements
Extendable: need to extend in order to:
–
–
Incorporate new input language and/or target system
Take advantage of advances in run-time environments (such
as ISA changes, multithreading, distributed/parallel execution)
Understand Multiple Languages Output for Multiple Targets
Abstract IR:
–
–
Good points:
–
Same representation for Fortran, C, C++, Java, …
Possible only for conceptually similar languages
Perform complex transformations on a single representation
Bad points:
–
–
Language semantics may either get lost or need additional
particular representation
Specific architecture characteristics are more profitable to use
than common (abstractable) ones
Presentation Navigator
Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Staged Compilation
Stage 1:
–
–
–
Save/reload, pipe, HTTP, … text file
–
Load source file (text) into IR1 – machine independent
Optimize IR1
Stream IR1 to text file
SUIF files, Java bytecode, .NET assembly
Stage 2:
–
–
–
Load text file into IR2 – machine dependent
Perform machine specific optimization on IR2
Generate executable code or interpret IR2
Staged Compilation
Stage 1
Stage 2
Examples
Static
Static
SUIF, Promis
Static
Dynamic
SUN JIT, .NET JITer
Static
Static +
Dynamic
DyC, Quicksilver
Staged Compilation
Prepare IR1 so that stage 2 is very cheap
–
Quicksilver
–
Insert templated optimized object code in bytecode
Pack speculative optimization validation predicates in bytecode
Keep method dependence graphs explicitly in bytecode
Microsoft .NET
Explicit type/class information in IL
Preformatted, quickly accessible metadata
–
Strings, tables, heaps
– Custom data
Allow embedding of native code
Presentation Navigator
Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Generate Fast And Portable Code
Fast code
–
IR close to machine structure
Portable code
–
Mapping data to registers
Mapping operations to opcodes
Scheduling instructions for superscalar/VLIW processors
Machine description must be totally abstracted
QuickSilver: templated optimized code
Generate Verifiable Code
Microsoft .NET IL
–
–
Static and dynamic type safety - reflections
Managed code
–
Managed data
–
Carries a minimum of information on itself
Usually signed by compiler in Stage 1
Only accessible from managed code
Garbage collected
Managed pointers
Generate Secure Code
Hard to define limits
–
–
Make sure you run what you mean to
Limit rights
Per user
Per software component
QuickSilver: digests
.NET IL:
–
–
Code is signed using encrypting of hashed original
Permissions are set per module
Generate Efficient Code
IR may also provide support for:
–
–
Versioning (Quicksilver, .NET)
Culture (.NET)
Presentation Navigator
Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Compiler Internal Representation General Organization
High-level - completely machine independent
–
–
–
–
–
Medium-level - dependent on classes of machines
–
Abstract Syntax Tree
Control Flow Graph
Control Dependence Graph
Data Dependence Graph
Static Single Assignment
Virtual machine code, such as stack machine
Low level - dependent on particular ISA
–
Assembly, machine instruction graphs
Case Study: Polaris
High level representation
–
–
–
–
–
Abstract Syntax Tree
Control Flow Graph
Control Dependence Graph
Data Dependence Graph
Gated Static Single Assignment
Some generality
–
Backends for various parallel execution systems
Case Study: SUIF2
Multiple level representation
–
–
–
–
CFG, CDG, …
Quads
Machsuif
Custom annotations
Multiple frontends: Fortran, C, Java
Multiple backends: SUIF VM, C, assembly
Decoupled passes communicate only via SUIF
Extendable: OSUIF
Case Study: Promis
Switch to Promis organization presentation
Switch to Promis IR presentation
Case Study: KCC
Kook and Associates (KAI) C++ compiler:
–
C++ dedicated internal representation
–
Proprietary C++ specific object format
–
–
Advanced C++ specific optimization
Interprocedural optimization with modular compilation
C++ specific debug information – usable with KDB
Outputs C with calls to proprietary run-time library
Uses GNU gcc to generate machine code
Case Study: Jalapeno QuickSilver
Quasi-static images
–
Java bytecode + proprietary format
Representation allows for optimizations
–
–
–
Explicit method dependence graph
Templated optimized object code
Speculative optimization validation predicates
Case Study: .NET
Advertised 9 digit $$ figure project
CLI (ECMA standard)
–
Common type system
–
–
–
Type info in intermediate code
Common exception system
–
30+
languages
MSIL
Throw in Visual Basic, catch in C++
Support for security, culture, versioning
Support for charging per-use
Custom info can be passed for original
language specific description
native code
Other Compilers – Open Source
GNU compiler:
–
–
–
–
C, Fortran, Java, C++ front-ends
Generates code for all major architectures
Low level internal representation
New version (3.x) has SSA
SGI open source project: discontinued
Other Compilers – Commercial
Fortran, C, C++, Java produced by OS and/or
hardware producers
–
Other commercial compiler producers:
–
HP, SGI, Intel, Microsoft, SUN
Borland, Watcom, etc.
Internal representation – company secret
Presentation Navigator
Introduction
Challenges
Staged compilation
Generate efficient code
Case studies
Conclusions
Conclusions
Internal representation evolved
–
–
–
–
Programming paradigms
Changes in hardware
Changes in compiler/run-time system technology
New issues: security, verifiability, culture, versioning
Tendency: E Pluribus Unum