Chap. 1 Introduction
Download
Report
Transcript Chap. 1 Introduction
Introduction to Compilers
Related Area
Programming languages
Machine architecture
Language theory
Algorithms
Data structures
Operating systems
Software engineering
Compilers
A compiler is a program that reads a program written
in one language – the source language – and
translate it into an equivalent program in another
language - the target language.
Early compilers - 1950’s
Machine Language, Assembly
Language, High-Level Language
Machine language is the native language of the
computer on which the program is run.
Native code
It consists of bit strings which are interpreted by the
mechanism inside the computer.
Example in IBM 370
Binary: 0001100000110101
Hexadecimal: 1835
Copy the content of Register 5 into Register 3
LR 3, 5
Assembler, assembly language
Machine Language, Assembly
Language, High-Level Language
Example
High-level language
X := Y + Z;
Assembly language
L 3, Y ; Load the working register with Y
A 3, Z ; Add Z
ST 3, X ; Store the result in X
Terminology
Source language
Object language
Java, C, C++
Machine language
Object code
Object file, object module
Target machine
The computer on which the program is to be run
Terminology
Cross compiler
A compiler that generates code for a machine that is
different from the machine on which the compiler runs.
Example:
A compiler which can be run on a IBM PC but which
compiles to the machine language of a specialpurpose embedded system.
Compilers and Interpreters
Compiler
Translates the high-level program to the target
program.
Interpreter
Executes the program.
The Environment of the
Compiler
The Environment of the
Compiler
Example
COMP myprog ; Compiles the program
LINK myprog ; Links the program
RUN myprog ; Runs the program
Phases of a Compiler
Five (six) phases of compilation
Lexical analysis
Syntactic analysis
(Semantic analysis)
Intermediate code generation
Optimization
Object code generation
Phases of a Compiler
Language Processing System
Phases of a Compiler
Two Parts of Compilation
Analysis
breaks up the source program
creates an intermediate representation
Synthesis
constructs the desired target program from the
intermediate representation
Analysis
Lexical Analysis
linear analysis, scanning
Syntax Analysis
parsing, hierarchical analysis
Semantic Analysis
Intermediate Code Generation
Advantage of dividing analysis
simple design
compiler efficiency
compiler portability
Analysis of the Source
Program
Lexical analysis (linear analysis)
Syntax analysis (hierarchical analysis)
the streams of characters making up the source
program is read from left-to-right and grouped into
tokens
characters or tokens are grouped hierarchically into
nested collections with collective meaning
Semantic analysis
certain checks are performed to ensure that the
components of a program fit together meaningfully
Lexical Analysis
Linear analysis, scanning
Reads the stream of characters in the source
program from left to right, and groups into
tokens
Tokens
are sequences of characters having a collective
meaning
Example: Lexical Analysis
position := initial + rate * 60
id1 := id2 + id3 * 60
Syntax Analysis
Parsing
Hierarchical analysis
Groups the tokens of the source program into
grammatical phrases represented by parse tree that
are used by the compiler to synthesize output.
Example: Syntax Analysis
Semantic Analysis
Checks the source program for semantic errors
and gathers type or semantic information for
the subsequent code generation phase
Example: Semantic Analysis
Error Handler
When each phases of compilation encounters
error, a phase must somehow deal with that
error.
Error in Lexical Phase
Error in Syntax Phase
The characters in the input do not form any token of
the language.
The token stream violates the structure rules (syntax)
of the language.
Error in Semantic Phase
Constructs have the right syntactic structure, but no
meaning to the operation involved.
S/W Tools
Performing Analysis
Structure editors
Pretty printers
indentation, fonts
Static checkers
a sequence of command => a source program
a program => discover bugs without run
Interpreters
performing operations
Performing Analysis
Text formatters
Silicon compilers
typeset text
circuit design
Query interpreters
DB
Intermediate Code Generation
Explicit intermediate representation
Two properties of intermediate code
A program for an abstract machine
Easy to produce
Easy to translate into the target program
Intermediate form
Three address form (quadruples, triples)
Two address form
Three-Address Code
Has at most three operands
Each three-address instruction has at most one
operator in addition to the assignment
The compiler must generate a temporary name
to hold the value computed by each instruction
May have fewer than 3 operands
Example: Three-Address Code
Example:
Intermediate Code
Synthesis Part
The synthesis part
constructs the
desired target
program from the
intermediate
representation.
Code Optimization
Improve the intermediate code to get the fastrunning machine code
Optimizing compiler
Example: Code Optimization
Code Generation
Generates the target codes
re-locatable machine code
assembly code
Example: Code Generation
System Support
There is a certain amount of supporting code to
be supplied to the compilation.
Symbol table management
Error handling
System Support
Symbol table handler
The central repository of information about the names
or identifiers in the program
Error handling
Implements the compiler’s response to errors in the
code it is compiling.
Diagnostics
Where the error was found and what kind of error it was
Passes, Front End, Back End
The compiler makes one or more passes
through the program.
A pass consists of reading a version of the
program from a file and writing a new version of
it to an output file.
A pass normally comprises more than one phase,
but the number of passes, and the phases they
cover, varies.
Passes, Front End, Back End
Front End
Dependent on the source language and have little or
no concern with the target machine
Lexical analysis
(Semantic analysis)
Intermediate code generation
Back End
Machine-dependent
Code optimization
Target code generation
Writing a Compiler
The first compiler was written in assembly
language; there was no other alternative.
High-level language compilers
Cross compiler
Useful tools – “compiler compilers”
Lex
Yacc
Retargetable Compilers
In many cases, a compiler writer will want to
adapt a compiler for use with a new target
A compiler that can be modified in this way is
said to be retargetable.
Cross compiler
Alternative approaches
Distinction between Front End and Back End
Compiler for imaginary machine (virtual machine)
Cousins of the Compiler
Preprocessors
Assemblers
Loaders and Link-Editors
Preprocessor
A preprocessor is a simple translator that is
applied to the source program before it is
submitted to the compiler.
Before the program is compiled it is passed
through the preprocessor, which replaces all
occurrences of the pre-defined expression with
the defined sequence of instructions.
Example
Functions of Preprocessors
Macro processing
File inclusion
Language extensions
DB query languages embedded in highlevel languages
Example: Preprocessors
The C Programming Language
Assemblers
Assemblers
Two-pass assembly
Loaders and Link-Editors
Assemblers
Assembly code
a mnemonic version of machine code, in which
names are used instead of binary codes for
operations, and
names are also given to memory addresses
Two-Pass Assembly(1)
In the first pass,
All the identifiers that denote storage
locations are found and stored in a symbol
table
Identifiers are assigned storage locations as
they are encountered for the first time
Example: b := a + 2
Two-Pass Assembly (2)
In the second pass,
The assembler scans the input again.
It translates each operation code into the
sequence of bits representing that operation in
machine language
It translates each identifier representing a
location into the address given for that identifier
in the symbol table
The output of the second pass is usually
relocatable machine code
Example:
Relocatable Addresses
Altering the relocatable address to
absolute or unrelocatable machine code:
Suppose that the address space containing the
data is to be loaded starting at location “L
=00001111”
L must be added to the address of the
instruction
Loaders and Link-Editors
Loader
Performs the two functions of loading and link-editing
Loading
Consists of taking relocatable machine code,
altering the relocatable addresses, and
placing the altered instructions and data in memory at
the proper locations
Link-Editors
Link-editor
Allows us to make a single program from several files
of relocatable machine code.
These files may have been the result of several
different compilations, and
One or more may be library files.
External references
In which the code of one file refers to a location in
another file.
Summary
A quick overall picture of
what a compiler does,
what goes into it, and
how it is organized