Chapter 5 - Basic Semantics

Download Report

Transcript Chapter 5 - Basic Semantics

Chapter 5 - Basic Semantics
Programming Languages:
Principles and Practice, 2nd Ed.
Kenneth C. Louden
© Kenneth C. Louden, 2003
1
Attributes

Properties of language entities, especially
identifiers used in a program.
 Important examples:
–
–
–
–
–
Value of an expression
Data type of an identifier
Maximum number of digits in an integer
Location of a variable
Code body of a function or method

Declarations ("definitions") bind attributes
to identifiers.
 Different declarations may bind the same
identifier to different sets of attributes.
Chapter 5
Louden, Programming Languages
2
Binding times can vary widely:





Value of an expression: during execution or
during translation (constant expression).
Data type of an identifier: translation time
(Java) or execution time (Smalltalk, Lisp).
Maximum number of digits in an integer:
language definition time or language
implementation time.
Location of a variable: load or execution
time.
Code body of a function or method:
translation time or link time or execution
time.
Chapter 5
Louden, Programming Languages
3
Symbol table and environment

A dictionary or table is used to maintain the
identifier/attribute bindings.

It can be maintained either during translation or
execution or both. (Pre-translation entities are
entered into the initial or default table.)

During translation this table is called the symbol
table.

During execution this table is called the
environment.

If both are maintained, the environment can
usually dispense with names, keeping track only
of locations (names are maintained implicitly).
Chapter 5
Louden, Programming Languages
4
Semantic functions

Formally, we can think of the symbol table and
environment as functions.

If the symbol table and environment are
maintained separately (compiler):
SymbolTable: Names  Static Attributes
Environment: Names  Locations
Memory: Locations  Values

If the symbol table and environment are
maintained together (interpreter):
Environment: Names 
Attributes (including locations and values)
Chapter 5
Louden, Programming Languages
5
Declarations





Declarations bind identifiers to attributes.
The collection of bound attributes (including the
identifier) can be viewed as equivalent to the
declaration itself.
Attributes may be explicit or implicit in a
declaration.
A declaration may fail to fully specify all
necessary attributes—a supplemental declaration
is then necessary elsewhere.
By abuse of language, we sometimes refer to the
declaration of x, or just x—using the identifier to
stand for the declaration.
Chapter 5
Louden, Programming Languages
6
Examples of declarations (C)

int x = 0;
Explicitly specifies data type and initial
value. Implicitly specifies scope (see next
slide) and location in memory.

int f(double);
Explicitly specifies type (double  int).
Implicitly specifies nothing else: needs
another declaration specifying code.

The former is called a definition in C, the
latter is simply a declaration.
Chapter 5
Louden, Programming Languages
7
Scope




The scope of a declaration is the region of the
program to which the bindings established by the
declaration apply. (If individual bindings apply over
different regions: scope of a binding.)
Scope is typically indicated implicitly by the
position of the declaration in the code, though
keywords can modify it.
In a block-structured language, the scope is
typically the code from the end of the declaration to
the end of the "block" (indicated by braces {…} in C
and Java) in which the declaration occurs.
Scope can extend backwards to the beginning of
the block in certain cases (class declarations in
Java and C++, top-level declarations in Scheme).
Chapter 5
Louden, Programming Languages
8
Lexical vs. dynamic scope





Scope is maintained by the properties of the lookup
operation in the symbol table or environment.
If scope is managed statically (prior to execution),
the language is said to have static or lexical scope
("lexical" because it follows the layout of the code
in the file).
If scope is managed directly during execution, then
the language is said to have dynamic scope.
The next slide has an example showing the
difference.
It is possible to maintain lexical scope during
execution (i.e. by the environment), but it requires
extra links and a somewhat unusual lookup
operation (see Chapter 8). (Scheme does this.)
Chapter 5
Louden, Programming Languages
9
Java scope example
public class Scope
{ public static int x = 2;
public static void f()
{ System.out.println(x); }
public static void main(String[] args)
{ int x = 3;
f();
}
}

Of course, this prints 2, but under dynamic scope it
would print 3 (the most recent declaration of x in the
execution path is found).
Chapter 5
Louden, Programming Languages
10
Dynamic scope evaluated

Almost all languages use lexical scope: with
dynamic scope the meaning of a variable cannot be
known until execution time, thus there cannot be
any static checking.
 In particular, no static type checking.
 Originally used in Lisp. Scheme could still use it, but
doesn't. Some languages still use it: VBScript,
Javascript, Perl (older versions).
 Lisp inventor (McCarthy) now calls it a bug.
 Still useful as a pedagogical tool to understand the
workings of scope. In some ways a lot like dynamic
binding of methods.
Chapter 5
Louden, Programming Languages
11
Scope holes

Under either lexical or dynamic scope, a
nested or more recent declaration can
mask a prior declaration. Indeed, in slide
10, the local declaration of x in main
masks the static declaration of x in the
Scope class.
 How would you access the static x inside
main in Java?
 Use Scope.x in place of x:
public static void main(String[] args)
{ int x = 3; Scope.x = 4; ...}

Exercise: how to do this for non-statics?
Chapter 5
Louden, Programming Languages
12
Symbol table structure

A table of little stacks of declarations under each
name. For example the table for the Scope class
of slide 10 would look as follows inside main
(using lexical scope):
name
x
int local to
main
f
public static
void method
main
public static
void method
args
Chapter 5
bindings
public
static int
String[]
parameter of main
Louden, Programming Languages
13
Symbol table structure (2)

Alternatively, a stack of little tables, one for each
scope. For example, the previous example would
look as follows (lexical scope):
Symbol table for Scope
name
x
f
main
bindings
public
static int
Can be deleted
after leaving f
public static
void method
Symtab:
name
(empty)
Current table
inside main
public static
void method
Symtab:
name
args
x
Chapter 5
bindings
Louden, Programming Languages
bindings
String[]
parameter
int local
14
Symbol table structure (3)





Symbol table is constructed as declarations are
encountered (insert operation).
Insertions follow static structure of source code with
lexical scope.
Insertions follow execution path with dynamic
scope.
Lookups occur as names are encountered in
dynamic scope (in symbol table to that point).
In lexical scope, lookups occur either as names are
encountered in symbol table to that point
(declaration before use—C), or all lookups are
delayed until after the symbol table is fully
constructed and then performed (Java class—scope
applies backwards to beginning of class).
Chapter 5
Louden, Programming Languages
15
Symbol table structure (4)

Using dynamic scope, the same example would
look as follows:
Current table
inside f
Current table
inside main
Chapter 5
Louden, Programming Languages
16
Symbol table structure evaluated

Which organization is better?
 Table of little stacks is simpler (C, Pascal).
 Stack of little tables is more versatile, and
helpful when you need to recover outer
scopes from within inner ones or from
elsewhere in the code (Ada, Java, C++).
 Normally, no specific table structure is
part of a language specification: any
structure that provides the appropriate
properties will do.
Chapter 5
Louden, Programming Languages
17
Ada example (Fig. 5.17, p.151):
(global symbol table:)
name
ex
bindings
procedure
symtab
name
bindings
x
integer
y
character
p
procedure
symtab
name
bindings
x
fl oat
A
block
symtab
name
y
Chapter 5
Louden, Programming Languages
bindings
array (1..10) of
integer
18
Overloading

Overloading is a property of symbol tables that
allows them to successfully handle declarations
that use the same name within the same scope.
 It is the job of the symbol table to pick the correct
choice from among the declarations for the same
name in the same scope. This is called overload
resolution.
 It must do so by using extra information, typically
the data type of each declaration, which it
compares to the probable type at the use site,
picking the best match.
 If it cannot successfully do this, a static semantic
error occurs.
Chapter 5
Louden, Programming Languages
19
Overloading (2)

Overloading typically applies only to functions or
methods.
 Overloading must be distinguished from dynamic
binding in an OO language.
 Overloading is made difficult by weak typing,
particularly automatic conversions.
 In the presence of partially specified types, such
as in ML, overload resolution becomes even more
difficult, which is why ML disallows it.
 Scheme disallows it for a different reason: there
are no types on which to base overload
resolution, even during execution.
Chapter 5
Louden, Programming Languages
20
Overloading (3)

An example in Java:
public class Overload {
public static int max(int x, int y)
{ return x > y ? x : y;}
public static double max(double x, double y)
{ return x > y ? x : y;}
public static int max(int x, int y, int z)
{ return max(max(x,y),z);}
public static void main(String[] args)
{ System.out.println(max(1,2));
System.out.println(max(1,2,3));
System.out.println(max(4,1.3));
}}

Adding more max functions that mix double and
int parameters is ok. But adding ones that mix
double and int return values is not!
Chapter 5
Louden, Programming Languages
21
Overloading (4)



C++ and Ada are even more challenging for
overload resolution: C++, because it allows many
more automatic conversions, Ada because the
return type is also used to resolve overloading
(Ada gets away with this only because it allows
no automatic conversions).
It is possible for languages to also keep different
symbol tables for different kinds of declarations.
In Java these are called "name spaces," and they
also represent a kind of overloading.
Java is particularly ugly in this respect: there are
different name spaces for classes, methods,
params/vars, labels, and even packages -- see
Figure 5.23, page 158.
Chapter 5
Louden, Programming Languages
22
The Environment

Can be constructed entirely statically (Fortran):
all vars and functions have fixed locations for the
duration of execution.
 Can also be entirely dynamic: functional
languages like Scheme and ML.
 Most language use a mix: C, C++, Java, Ada.
 Consists of three components:
– A fixed area for static allocation
– A stack area for lifo allocation (usually the
processor stack)
– A "heap" area for on-demand dynamic
allocation (with or without garbage collection)
Chapter 5
Louden, Programming Languages
23
Typical
environment
organization
(possible C)
[Figure 5.25, p. 165)]
© 2003 Brooks/Cole - Thomson Learning™
Chapter 5
Louden, Programming Languages
24
The Runtime Stack

Used for:
– Procedure/function/method calls
– temporaries
– local variables

Temporaries: intermediate results that cannot be
kept in registers; not considered here further.

Procedure calls: Chapter 8.

Local variables: part of calls, but can be
considered independently, showing LIFO
behavior for nested scopes (next slide).
Chapter 5
Louden, Programming Languages
25
Example of stack-based allocation in
C within a procedure:
(1)
(2)
(4)
(5)
(7)
(8)
(9)
(11)
(12)
(14)
(16)
(18)
Chapter 5
A: {
int x;
char y;
B: { double x;
int a;
} /* end B */
C: { char y;
int b;
D: { int x;
double y;
} /* end D */
} /* end C */
} /* end A */
Louden, Programming Languages
Point #1
Point #2
26
Stack at Point #1:
© 2003 Brooks/Cole - Thomson Learning™
Chapter 5
Louden, Programming Languages
27
Stack at Point #2:
© 2003 Brooks/Cole - Thomson Learning™
Chapter 5
Louden, Programming Languages
28
An alternative: a flat local space





All local variables allocated at once, regardless of
nesting.
Wastes some space, but not critically.
With this approach, only complete
function/method calls get allocated on the stack.
Even with the previous approach, the primary
structure of the stack is still the call structure: a
complete record of a call on the stack is called an
activation record or frame, and the stack is
referred to as the call stack. (Chapter 8)
Java promotes a flat space by forbidding nested
redeclarations, but this is not an essential
property: a symbol table can easily distinguish
nested declarations as A.x, A.y, A.B.x, A.B.a, etc.
Chapter 5
Louden, Programming Languages
29
Heap Allocation

In "standard" languages (C, C++, Java) heap
allocation requires a special operation: new.

Any kind of data can be allocated on the heap in
C/C++; in Java all objects and only objects are
allocated on the heap.
Even with heap allocation available in Java &
C/C++, the stack is still used to represent calls.
In C/C++, deallocation is typically by hand
(destructors), but it is hard to do right.
Java uses a garbage collector that periodically
sweeps the heap looking for data that cannot be
accessed any more by the program and adding it
back to free space.



Chapter 5
Louden, Programming Languages
30
Heap Allocation (2)



In functional languages (Scheme, ML) heap
allocation is performed automatically, and
virtually everything, including function calls, are
allocated on the heap.
Of course, functional languages also use garbage
collection, since deallocation is automatic as
well. (Indeed, SML/NJ as its default still quaintly
announces calls to the garbage collector,
including some statistics.)
A lot of study and effort has been made by both
the functional language and OO language
community to make garbage collection efficient in
both time and space. Sadly, C and C++ still lack
standard garbage collectors.
Chapter 5
Louden, Programming Languages
31
Benefit (of system controlled storage management):





ability to delay the binding of a storage segment's size and/or location
reuse of a storage segment for different jobs (from system supervisor
point of view)
reuse of storage for different data structures
increased generality, not have to specify maximum data structure size
dynamic data structures
recursive procedures - garbage collection is automatic

Benefits of programmer controlled storage management

Disadvantage: burden on programmer & may interfere with necessary
system control
May lead to subtle errors
May interfere with system-controlled storage management
 Advantage: difficult for system to determine when storage may be most
effectively allocated and freed
Chapter 5
Louden, Programming Languages
32
Heap management
– Single-size cells vs. variable-size cells
– Reference counters (eager approach) vs.
garbage collection (lazy approach)
1. Reference counters: maintain a counter in every
cell that store the number of pointers currently
pointing at the cell
– Disadvantages: space required, complications
for cells connected circularly
Expensive - when making a pointer assignment p=q
1.
decrement count for old value of p
2.
if 0, return to free storage. Check if contains references to other
blocks. Could be recursive
3.
do pointer assignment
4.
Increment reference count for q
Chapter 5
Louden, Programming Languages
33
One-bit reference counting




Another variation on reference counting, called one-bit
reference counting, uses a single bit flag to indicate whether
each object has either "one" or "many" references.
If a reference to an object with "one" reference is removed,
then the object can be recycled.
If an object has "many" references, then removing references
does not change this, and that object will never be recycled. It
is possible to store the flag as part of the pointer to the object,
so no additional space is required in each object to store the
count.
One-bit reference counting is effective in practice because
most actual objects have a reference count of one.
Chapter 5
Louden, Programming Languages
34
2. Garbage collection: allocate until all available cells
allocated; then begin gathering all garbage
– Every heap cell has an extra bit used by collection
algorithm
– All cells initially set to garbage
– All pointers traced into heap, and reachable cells
marked as not garbage
– All garbage cells returned to list of available cells
Chapter 5
Louden, Programming Languages
35
Disadvantages: when you need it most, it works
worst (takes most time when program needs
most of cells in heap)
Chapter 5
Louden, Programming Languages
36
Mark-Sweep - Java uses
 In a mark-sweep collection, the collector first examines the program
variables; any blocks of memory pointed to are added to a list of
blocks to be examined.
 For each block on that list, it sets a flag (the mark) on the block to
show that it is still required, and also that it has been processed. It
also adds to the list any blocks pointed to by that block that have not
yet been marked. In this way, all blocks that can be reached by the
program are marked.
 In the second phase, the collector sweeps all allocated memory,
searching for blocks that have not been marked. If it finds any, it
returns them to the allocator for reuse
 Can find circular references.
 Easy if regular use of pointers (Like in LISP)
 All elements must be reachable by a chain of pointers which begins
outside the heap
 Have to be able to know where all pointers are - both inside the heap
and outside. How can a chain be followed from a pointer if there is no
predefined location for that pointer in the pointed-to cell?
Chapter 5
Louden, Programming Languages
37
Conservative garbage collection
 Although garbage collection was first invented in 1958, many
languages have been designed and implemented without the
possibility of garbage collection in mind. It is usually difficult to add
normal garbage collection to such a system, but there is a technique,
known as conservative garbage collection, that can be used.
 The usual problem with such a language is that it doesn't provide the
collector with information about the data types, and the collector
cannot therefore determine what is a pointer and what isn't.
 A conservative collector assumes that anything might be a pointer. It
regards any data value that looks like a pointer to or into a block of
allocated memory as preventing the recycling of that block.
 You might think that conservative garbage collection could easily
perform quite poorly, leaving a lot of garbage uncollected. In practice,
it does quite well, and there are refinements that improve matters
further.
 Because references are ambiguous, some objects may be retained
despite being actually unreachable. In practice, this happens rarely,
and refinements such as black-listing can further reduce the odds.
Chapter 5
Louden, Programming Languages
38
Lifetime/Extent

The lifetime or extent of a program entity is the
duration of its allocation in the environment.

Allocation is static when the lifetime is the
duration of the entire program execution.

Lifetime is related to but not identical to scope.
With scope holes, lifetime can extend to regions
of the program where the program entity is not
accessible.

It is also possible for scope to exceed lifetime
when a language allows locations to be
manipulated directly (as for example manual
deallocation). This is of course very dangerous!
Chapter 5
Louden, Programming Languages
39
Variables and Constants

A variable is an object whose stored value
can change during execution.
 A constant is an object whose value does
not change throughout its lifetime.
 Constants are often confused with literals:
constants have names, literals do not.
 Constants may be:
– compile-time static (may not ever be allocated)
– load-time static
– dynamic
Chapter 5
Louden, Programming Languages
40
Constants (2)

Compile-time constant in Java:
static final int zero = 0;

Load-time constant in Java:
static final Date now = new Date();

Dynamic constant in Java:
any non-static final assigned in a constructor.
 Java takes a very general view of constants,
since it is not very worried about getting rid of
them during compilation.
 C takes a much stricter view of constants,
essentially forcing them to be capable of
elimination during compilation.
Chapter 5
Louden, Programming Languages
41
Aliases, Dangling References, and
Garbage



An alias occurs when the same object is bound to
two different names at the same time. This is
fairly common with Java objects.
A dangling reference is a location that has been
deallocated from the environment, but is still
accessible within the program. Dangling
references are impossible in a garbage-collected
environment with no direct access to addresses.
Garbage is memory that is still allocated in the
environment but has become inaccessible to the
program. Garbage can be a problem in a nongarbage collected environment, but is much less
serious than dangling references.
Chapter 5
Louden, Programming Languages
42