Chap. 1 Introduction

Download Report

Transcript Chap. 1 Introduction

Introduction to Compilers
Related Area







Programming languages
Machine architecture
Language theory
Algorithms
Data structures
Operating systems
Software engineering
Compilers


A compiler is a program that reads a program written
in one language – the source language – and
translate it into an equivalent program in another
language - the target language.
Early compilers - 1950’s
Machine Language, Assembly
Language, High-Level Language

Machine language is the native language of the
computer on which the program is run.



Native code
It consists of bit strings which are interpreted by the
mechanism inside the computer.
Example in IBM 370





Binary: 0001100000110101
Hexadecimal: 1835
Copy the content of Register 5 into Register 3
LR 3, 5
Assembler, assembly language
Machine Language, Assembly
Language, High-Level Language

Example



High-level language
X := Y + Z;
Assembly language



L 3, Y ; Load the working register with Y
A 3, Z ; Add Z
ST 3, X ; Store the result in X
Terminology

Source language


Object language




Java, C, C++
Machine language
Object code
Object file, object module
Target machine

The computer on which the program is to be run
Terminology

Cross compiler


A compiler that generates code for a machine that is
different from the machine on which the compiler runs.
Example:

A compiler which can be run on a IBM PC but which
compiles to the machine language of a specialpurpose embedded system.
Compilers and Interpreters

Compiler


Translates the high-level program to the target
program.
Interpreter

Executes the program.
The Environment of the
Compiler
The Environment of the
Compiler

Example



COMP myprog ; Compiles the program
LINK myprog ; Links the program
RUN myprog ; Runs the program
Phases of a Compiler

Five (six) phases of compilation






Lexical analysis
Syntactic analysis
(Semantic analysis)
Intermediate code generation
Optimization
Object code generation
Phases of a Compiler
Language Processing System
Phases of a Compiler
Two Parts of Compilation

Analysis



breaks up the source program
creates an intermediate representation
Synthesis

constructs the desired target program from the
intermediate representation
Analysis

Lexical Analysis


linear analysis, scanning
Syntax Analysis

parsing, hierarchical analysis

Semantic Analysis

Intermediate Code Generation

Advantage of dividing analysis



simple design
compiler efficiency
compiler portability
Analysis of the Source
Program

Lexical analysis (linear analysis)


Syntax analysis (hierarchical analysis)


the streams of characters making up the source
program is read from left-to-right and grouped into
tokens
characters or tokens are grouped hierarchically into
nested collections with collective meaning
Semantic analysis

certain checks are performed to ensure that the
components of a program fit together meaningfully
Lexical Analysis



Linear analysis, scanning
Reads the stream of characters in the source
program from left to right, and groups into
tokens
Tokens

are sequences of characters having a collective
meaning
Example: Lexical Analysis

position := initial + rate * 60

id1 := id2 + id3 * 60
Syntax Analysis

Parsing

Hierarchical analysis

Groups the tokens of the source program into
grammatical phrases represented by parse tree that
are used by the compiler to synthesize output.
Example: Syntax Analysis
Semantic Analysis

Checks the source program for semantic errors
and gathers type or semantic information for
the subsequent code generation phase
Example: Semantic Analysis
Error Handler


When each phases of compilation encounters
error, a phase must somehow deal with that
error.
Error in Lexical Phase


Error in Syntax Phase


The characters in the input do not form any token of
the language.
The token stream violates the structure rules (syntax)
of the language.
Error in Semantic Phase

Constructs have the right syntactic structure, but no
meaning to the operation involved.
S/W Tools
Performing Analysis

Structure editors


Pretty printers


indentation, fonts
Static checkers


a sequence of command => a source program
a program => discover bugs without run
Interpreters

performing operations
Performing Analysis

Text formatters


Silicon compilers


typeset text
circuit design
Query interpreters

DB
Intermediate Code Generation

Explicit intermediate representation


Two properties of intermediate code



A program for an abstract machine
Easy to produce
Easy to translate into the target program
Intermediate form


Three address form (quadruples, triples)
Two address form
Three-Address Code




Has at most three operands
Each three-address instruction has at most one
operator in addition to the assignment
The compiler must generate a temporary name
to hold the value computed by each instruction
May have fewer than 3 operands
Example: Three-Address Code
Example:
Intermediate Code
Synthesis Part

The synthesis part
constructs the
desired target
program from the
intermediate
representation.
Code Optimization


Improve the intermediate code to get the fastrunning machine code
Optimizing compiler
Example: Code Optimization
Code Generation

Generates the target codes


re-locatable machine code
assembly code
Example: Code Generation
System Support

There is a certain amount of supporting code to
be supplied to the compilation.


Symbol table management
Error handling
System Support

Symbol table handler


The central repository of information about the names
or identifiers in the program
Error handling


Implements the compiler’s response to errors in the
code it is compiling.
Diagnostics

Where the error was found and what kind of error it was
Passes, Front End, Back End



The compiler makes one or more passes
through the program.
A pass consists of reading a version of the
program from a file and writing a new version of
it to an output file.
A pass normally comprises more than one phase,
but the number of passes, and the phases they
cover, varies.
Passes, Front End, Back End

Front End





Dependent on the source language and have little or
no concern with the target machine
Lexical analysis
(Semantic analysis)
Intermediate code generation
Back End



Machine-dependent
Code optimization
Target code generation
Writing a Compiler




The first compiler was written in assembly
language; there was no other alternative.
High-level language compilers
Cross compiler
Useful tools – “compiler compilers”


Lex
Yacc
Retargetable Compilers


In many cases, a compiler writer will want to
adapt a compiler for use with a new target
A compiler that can be modified in this way is
said to be retargetable.


Cross compiler
Alternative approaches


Distinction between Front End and Back End
Compiler for imaginary machine (virtual machine)
Cousins of the Compiler

Preprocessors

Assemblers

Loaders and Link-Editors
Preprocessor



A preprocessor is a simple translator that is
applied to the source program before it is
submitted to the compiler.
Before the program is compiled it is passed
through the preprocessor, which replaces all
occurrences of the pre-defined expression with
the defined sequence of instructions.
Example
Functions of Preprocessors

Macro processing

File inclusion

Language extensions

DB query languages embedded in highlevel languages
Example: Preprocessors

The C Programming Language
Assemblers

Assemblers

Two-pass assembly

Loaders and Link-Editors
Assemblers

Assembly code



a mnemonic version of machine code, in which
names are used instead of binary codes for
operations, and
names are also given to memory addresses
Two-Pass Assembly(1)

In the first pass,



All the identifiers that denote storage
locations are found and stored in a symbol
table
Identifiers are assigned storage locations as
they are encountered for the first time
Example: b := a + 2
Two-Pass Assembly (2)

In the second pass,




The assembler scans the input again.
It translates each operation code into the
sequence of bits representing that operation in
machine language
It translates each identifier representing a
location into the address given for that identifier
in the symbol table
The output of the second pass is usually
relocatable machine code
Example:
Relocatable Addresses

Altering the relocatable address to
absolute or unrelocatable machine code:


Suppose that the address space containing the
data is to be loaded starting at location “L
=00001111”
L must be added to the address of the
instruction
Loaders and Link-Editors

Loader


Performs the two functions of loading and link-editing
Loading



Consists of taking relocatable machine code,
altering the relocatable addresses, and
placing the altered instructions and data in memory at
the proper locations
Link-Editors

Link-editor




Allows us to make a single program from several files
of relocatable machine code.
These files may have been the result of several
different compilations, and
One or more may be library files.
External references

In which the code of one file refers to a location in
another file.
Summary

A quick overall picture of



what a compiler does,
what goes into it, and
how it is organized