Transcript Lecture 15
Compiler Construction
(CS-636)
Muhammad Bilal Bashir
UIIT, Rawalpindi
1
Outline
The Symbol Table
1.
1.
2.
3.
4.
2.
3.
The Structure of the Symbol Table
Declarations
Scope Rules and Block Structure
Interaction of Same Level Declarations
Data Types & Type Checking
Summary
2
Semantic Analysis
Lecture: 15-16
3
The Symbol Table
The symbol table is a major data structure in a
compile after the syntax tree
In some languages symbol table is involved during
the process of parsing and even lexical analysis
where they need to add some information in it or
may need to look for something from it
But in a careful designed language like Pascal or
Ada, it is possible and reasonable to put off symbol
table operations until after a complete parse, when
the program being translated is known to be
syntactically correct
4
The Symbol Table (Continue…)
The principal symbol table operations include
Insert is used to store the information provided by name
declarations
Lookup is needed to retrieve the information associated to a
name
Delete is needed to remove the information provided by
declaration when that declaration no longer applies
Typically symbol table stores data type information,
information or region of applicability (scope) , and
information on eventual location in memory
5
The Structure of the Symbol Table
The symbol table in a compiler is a typical dictionary
data structure
The efficiency of three basic operations insert, lookup, and
delete vary according to the organization of data structure
Typical implementations of dictionary structures
include linear lists, various search tree structures,
and hash tables
Linear lists are a good basic data structure that can
provide easy and direct implementation of the three
basic operations
Constant Time for insert and Linear Time to the size of the
list for Lookup and Delete
6
The Structure of the Symbol Table
(Continue…)
Linear lists can be a good choice in case of
implementations where compilation speed is not a
major concern
Search tree structures are somewhat less useful for
the symbol table, partially because they do not
provide best case efficiency, but also because of the
complexity of the delete operation
The hash table often provides the best choice for
implementing the symbol table
All three basic operations can be performed in almost
constant time, and is used most frequently in practice
7
The Structure of the Symbol Table
(Continue…)
A hash table is an array of entries, called buckets,
indexed by an integer range, usually from 0 to the
table size minus one
A has function turns the search key (identifier
name) into an integer hash value in the index range,
and the item corresponding to the search key is
stored in the bucket at this index
The has function should distribute the key indices as
uniformly as possible over the index range, since
has collisions a performance degradation in the
lookup and delete operations
8
The Structure of the Symbol Table
(Continue…)
An important question is how has table deals with
collisions (often called collision resolution)
One method allocates only enough space for a
single item in each bucket and resolves collisions by
inserting new items in successive buckets (this is
sometimes called open addressing)
In this case the contents of the hash tables are limited by
the size of the array used for the table, and as the array fills
collisions become more and more frequent
The best choice for compilers is the alternative to
open addressing, called separate chaining
9
The Structure of the Symbol Table
(Continue…)
In separate chaining method, each bucket is actually
a linear list
Collisions are resolved by inserting the new item into the
bucket list
10
The Structure of the Symbol Table
(Continue…)
One question still remains that how the hash
function works
The hash function
It converts a character string into an integer in the range
0…size-1 in three steps
First, each character in the string is converted into a
nonnegative integer
Second, these nonnegative integers are combined in some
way to form a single integer
Finally, the resulting integer is scaled in the range 0…size-1
11
Declarations
The behavior of the symbol table depends heavily
on the properties of declarations of the language
being translated
How the insert and delete operations act on the symbol
table, when these operations need to be called, and what
attributes are inserted into the table
There are four basic kinds of declarations
1.
2.
3.
4.
Constant declaration
Type declaration
Variable declaration
Procedure/Function declaration
12
Declarations (Continue…)
It is easiest to use one symbol table to hold the
names from all the different kinds of declarations
When programming language prohibits the use of the same
in different kinds of declarations
Occasionally it is easier to use a different symbol
table for each kind of declaration
For example all type declarations are contained in one
symbol table whereas all variable declarations are in a
different symbol table and so on
13
Declarations (Continue…)
The attributes bound to a name by declaration vary
with the kind of the declaration
Constant declarations associate values to names;
sometimes constant declarations are called value bindings
for this reason
Type declarations bind names to newly constructed types
and may also create aliases for existing named types
Variable declarations most often bind names to data types.
Besides data type, it may bind more attributes implicitly e.g.
scope of a variable
Procedure/Function declarations may bind return type and
parameters as attribute
14
Scope Rules and Block Structure
Scope rules vary widely from language to language
but there are some rules that are common
Here we will discuss two of these rules; declaration
before use and most closely nested rule for block
structure
Declaration before use
Name be declared in the text of the program prior to any
reference to the name
It permits the symbol table to be built as parsing proceeds
and for lookup to be performed as soon as a name
reference is encountered in the code
15
Scope Rules and Block Structure
(Continue…)
Block structure
It is a common property of programming languages
A language is block structured if it permits the nesting of
blocks inside other blocks
If the scope of declarations in a block are limited to that
block and other blocks contained in that block, subject to
the most closely nested rule
Given several different declarations for same name, the declaration
that applies to a reference is the one in the most closely nested
block to the reference
16
Scope Rules and Block Structure
(Continue…)
To implement nested scopes and most closely
nested rule, the symbol table insert operation must
not overwrite previous declaration
The insert operation should hide the previous declaration so
the lookup operation can only find the recent one
The delete operation must not delete all declarations
corresponding to a name, but only the most recent
one, uncovering any previous declaration
Symbol table construction can proceed by
performing insert operations for all declared names
on entry into block & delete operations on exit from
block
17
Scope Rules and Block Structure
(Continue…)
Build symbol table for following code;
int i, j
int f(int size)
{ char i, temp;
…
{ double j;
…
}
…
{ char * j;
…
}
}
18
Interaction of Same Level
Declarations
One main issue that relates to scope is the
interactions among declarations at the same level
One typical requirement in many languages is that
there can be no reuse of the same in the declaration
at the same level
To check this requirement, a compiler must perform a
lookup before each insert and determine by some
mechanism whether any preexisting declaration with the
same name are at the same level or not
Somewhat more difficult is the question of how
much information the declaration in a sequence at
the same level have available about each other
19
Interaction of Same Level
Declarations (Continue…)
Consider the following code;
int a = 1;
void f(void)
{ int a = 2, j = a+1;
…
} //which ‘a’ will be used to assign value to ‘j’?
If each declaration is added to the symbol table as it
is processed, it is called sequential declaration
If all the declarations are processed simultaneously
and added at once to symbol table at the end of a
section, then it is called collateral declaration
20
Interaction of Same Level
Declarations (Continue…)
For each recursive declaration of function or
procedure, the compiler must insert the name of
function or procedure as it finds its declaration,
otherwise compiler may consider recursive call as
an error
Error of use before declaration
21
Data Types & Type Checking
One of the principal tasks of a compiler is the
computation and maintenance of information on
data types (type reference)
Compiler uses this information to ensure that each
part of the program makes sense under the type
rules of the language (type checking)
Data type information can occur in a program in
several different forms
Theoretically, a data type is a set of values, or more
precisely a set of values with certain operations on
those values
22
Summary
Any Questions?
23