Chapter 5 - Basic Semantics

Download Report

Transcript Chapter 5 - Basic Semantics

Introduction to Semantics
The meaning of a language
1
Overview
This section will cover:
 what is semantics of a computer language?
 Three forms of formal semantics and their purpose.
 classification of errors
2
What is semantics?
Semantics describes the meaning of the elements that
make up a programming language.
 Syntax describes the form of program elements.

Syntax:
ident
::= [A-Za-z_]\w*
declaration ::= datatype ident ['=' value]
Semantics:
- You can only declare an identifier once per scope!
- You must declare every identifier before you use it in a
statement.
- An identifier cannot be a reserved word.
3
What is semantics? (cont'd)

Semantic definition of syntax raises more semantic
questions...
What is a scope?
What is a data type?
What operations are permitted on data types?
Syntax:
term
::= factor { (*|/) factor }
Semantics (for integer arithmetic):
Division by zero raises an exception
If the result exceeds the range of 32-bit 2's complement
integers, the higher bits are ignored!
4
The parts of semantics
Semantics defines many characteristics of a language;
semantics is intertwined with:

names and scope of names

binding time

the type system

protocol for subprograms
These issues will be covered later.
5
Semantics: Type System
Semantics defines the type system
 type system - data types and allowed operations
 can the programmer define his own data types or
aliases for existing data types?
 range of data types and result of operations
Example:

Syntax: intvalue ::= 0 | [-](1|...|9){ digit }
Semantics: int values are stored as 32-bit binary values in 2's
complement form, with a range 2-31 to 231-1.
Syntax: expr ::= term { (+|-) term }
Semantics: integer + and - are performed modulo the range of int
values. If a result of a calculation exceeds the largest value, then
higher order bits results from the calculation are discarded.
6
Semantics: Identifiers

Identifiers are names. Names can identify:
 variables and constants
 points in the program (labels)
 functions and procedures (subprograms)
 programmer-defined data types, including classes
Syntax: identifier ::= [A-Za-z_]\w*
Semantics:




all variable and constant names must be unique!
cannot use reserved words as identifiers
names are case sensitive
can a function (method) have same name as variable?
7
Semantics: Scope
Scope is a range of statements.
 Scope of an identifier is the range where an identifier is
visible or known.
 Semantics defines the scope of identifiers.

8
Scope Examples
In Pascal, Fortran, and C: variables are defined at the
top of a subprogram. Their scope is the entire
subprogram.
REAL FUNCTION ADD(A, B)
REAL X
X = A + B
ADD = X
RETURN
END
PROGRAM MAIN
REAL X, Y, A
X = 20.0
Y = 5.0
A = ADD(X, Y)
PRINT *, "Sum is ", A
9
Scope Examples
C++, C#, Java: variables can be defined anywhere, their
scope is from the point of declaration to the end of the
smallest enclosing block, { ... }. Idea based on Algol.
C
float sum(int max) {
float x, sum = 0;
int k;
for(k=1; k<=max; k++){
scanf("%f", &x);
sum = sum + x;
}
return sum;
}
C++
float sum(int max) {
float sum = 0;
for(int k=1; k<=max;k++){
float x;
scanf("%f", &x);
sum = sum + x;
}
return sum;
}
10
Semantics: Binding Time

Semantics also defines when names are "bound" to
properties.
C Example: int n = 1000;
"int" bound by C language definition
"int is 32-bits" bound by compiler implementation
"n is an int" bound when program is compiled
address of n is bound when program is loaded (static
var) or when function is executed (stack local var)
"n = 1000" is bound when program is loaded (static)
or each time function is executed
11
Semantics: Subprograms

Semantics defines the meaning of subprograms.
In particular, how parameters are passed and values
returned.
/* C and C++ default is
* to pass parameters
* by value
*/
void swap(int a, int b) {
int tmp;
tmp = a;
a = b;
b = tmp;
}
/* C++ and C# let you
* pass parameters
* by reference
*/
void swap(int& a, int& b) {
int tmp;
tmp = a;
a = b;
b = tmp;
}
12
Formal Semantics

Three mathematical notations for semantics exist.

Operational semantics describes the effect of each
semantic element on state of a hypothetical computer.

Axiomatic semantics describes assertions (or axioms)
of what must be true before and after an expression is
executed.

Denotational semantics describes semantic elements
as state changing functions, again using some
hypothetical computer. May use recursive functions.
13
Formal Semantics Examples
Consider assignment: target = source
Operational semantics: s is the state of the computer, v
is any value of the source, U-bar is overriding union:
s (source)  v
s ( target  source)  s  {( target ,v )}
Axiomatic semantics: if s  target = source
true
s ( s.target \ s.source) ss
Denotational semantics: M is a mapping of expressions
to program states s .
M : Statement  State  State
M ( s, s )  s  {( s.target ,s.source )}
14
Why Formal Semantics?
1. Avoid ambiguities in the implementation
 This can lead to different compilers producing
different executable programs from same source.

Ada had an ambiguity in implementation of "in out"
parameters. In some programs, different
compilers produced different results!
2. Enable formal proof of program correctness, at least
in some situations.
3. Enable verification that a compiler adheres to
language specification.
15
Static and Dynamic Characteristics

Aspects of a computer language can be defined as
static or dynamic. You often hear "dynamic memory
allocation" or "static binding".
Static - something that is done or known before the
program executes, including things done while the
program is being loaded for execution.
Dynamic - something that is done or known while the
program executes.
16
Static/Dynamic Examples

Syntax checking for compiled languages is static

A division by zero error is dynamic
(unless you insult the compiler by writing "x/0")

The definition of data types like "int", "float" is static.

Allocating memory for function calls is dynamic.

The scope of a variable can be static or dynamic,
depending on the language... but usually static.
17
Classifying Errors

It is helpful to classify errors by type and when they are
detected.
18
Classifying Errors

Lexical errors are detected by the compiler: static.

Syntax errors are detected by the compiler: static.

Semantic errors may be:
detected by compiler.
int n = 2.5;
detected by linker.
r = SQRT(x*x+y*y);
detected at run-time.
/* java */
for(k=-1; ;k++) sum +=a[k];

Logic errors may be:
detected at run-time
not detected at all
19
Find 9 errors in this program

classify as: lexical, syntax, static semantic, dynamic
semantic, or logical. Indicate when error is detected.
include <stdio.h>
/* return maximum of x and y */
int max( integer x, integer y ) {
if (x > y) return y;
else return x;
}
int main( ) {
int a, b;
printf("Input two integers: ");
scanf("%f %f", a, b);
printf("The max of %d and %d is %d\n", a, b,
MAX(x,y);
return;
}
20
Find 9 errors in this program: solution
include <stdio.h>
1. Syntax: missing "#" detected by compiler at "<" symbol
int max( integer x, integer y ) {
2. Static Semantic: "integer" isn't a datatype, compiler detect
if (x > y) return y;
else return x;
3. Logic Error not detected!: this returns min of x and y
scanf("%f %f", a, b);
4. Dynamic semantic error: "%f" should be "%d", may be a
run-time error or not detected at all
5. Semantic error: must use address of a, b (&a,&b) in scanf.
The compiler should detect this, but it may not (gcc did not), since
an int can be an address! Maybe runtime error.
21
Find 9 errors in this program: solution
printf("The max of %d and %d is %d\n", a, b,
MAX(x,y);
6. Static semantic error: "MAX" should be "max". The linker
will report an "unresolved external symbol" error because it couldn't
find a function named "MAX".
7. Static semantic error: (x,y) should be (a,b). Compiler will
report use of undefined variables x, y.
8. Syntax error: missing ")" to close printf( ... ). Compiler reports
this as a syntax error.
return;
9. Static Semantic error: declared "int main" but here there
is no return value. Semantics says that the function's actual return
type has to be the same as in the header. Detected by compiler.
22
Find 7 errors in this program

classify as: lexical, syntax, static semantic, dynamic
semantic, or logical. Indicate when error is detected.
#include <stdlib.h>
/* return x modulo y, return 0 if y is 0. */
int mod( int x, int y ) {
if ( y = 0 ) return 0;
else return x # y;
}
void main( ) {
int a, b;
printf("Input two integers: ");
scanf("%d %d", a, b);
printf("%d mod %d is %d\n", a, b, mod(b,a);
return;
}
23
Find 7 errors: partial solution
#include <stdlib.h>
Static semantic error: we didn't #include <stdio.h>, so
compiler should give an error when scanf and printf are used.
However, gcc ignores this.
if ( y = 0 ) return 0;
// should be ( y == 0 )
Logic error: the C language allows any expression to be used
as a test condition in "if". This will set y equal 0, then return a value
0, so the "if" test is always false. The next statement will produce a
division by zero error. Java doesn't allow conversion of other
datatypes to boolean, so this would be a syntax error in Java.
void main( ) {
Static semantic error: the C language says that main should
return an int. Compiler reports this error.
24
Attributes
Properties of language entities, especially identifiers.
 Examples:
 Value of an expression
 Data type of an identifier
 Number of digits in a numeric data type
 Memory location of a variable
 Code body of a function or method
 Declarations ("definitions") bind attributes to identifiers.
 Different declarations may bind the same identifier to
different sets of attributes.

25
Binding

Binding means "an association"
 associate names with values
 associate symbols with operations

Binding Time describes when this occurs

Example: int count;
 the name "int" was bound by the C language def'n
(along with meanings of operators +, -, ... for int)
 the size (and set of possible values) of "int" was
bound bound at compiler design time
 identifier "count" is bound to "int" at compile time

the location is bound at load or execution time
26
Binding Times
Louden gives 6 possible binding times:
 language definition time: Java defines precision
of int; C leaves it to the implementation. In C, an
"int" can be 16 bits or 32 bits. The stdint.h header
on UNIX provides typedefs, such as:
typedef short int
typedef int





int16_t;
int32_t;
language implementation time: when the compiler
or interpreter is written
translation time (compile time)
link time, for compiled programs
load time
27
execution time
Load Time versus Execution Time

How are count and sum different?

C example:
int count;
/* an external variable is static */
int sub( ) {
int sum; /* a local variable, dynamically allocated */
/* do something */
}
count is allocated storage at load time (and exists for
the life of the program)
sum is allocated storage at execution time, i.e. each
time sub is executed
The scope of count and sum are also different.
28
Static and Dynamic Binding
Static Binding - occurs before the program is run
Dynamic Binding - occurs while the program is running

a symbol can have both static and dynamic attributes
/* Binding time example */
int count; /*external var */
int sub( ) {
int sum = 0;
static int last = 0;
int *x;
void *p;
p = (double *)malloc(...);
Type Binding
static
static
static
static
static
dynamic
Storage Binding
static
dynamic
dynamic
static
dynamic
dynamic
29
Exercise

For each of these attributes, indicate the binding time
in C and Java as precisely as possible.
1.
number of significant digits in a "float"
2.
the meaning of "char"
3.
the size of an array variable
4.
the memory location of a local variable
5.
the value of a constant (C "const int", Java "final")
6.
the memory location of a function or method
Hint: C and Java differ at least in items 1 and 5
30
So now you know...

When someone asks, "are method names statically or
dynamically bound to actual code"?
/* Java */
class Pet {
public void talk( ) {
System.out.println("hello");
}
}
class Dog extends Pet {
public void talk() {
System.out.println("woof");
}
...
Pet p = new Dog( );
p.talk( );
/* C++ */
class Pet {
public:
void talk( ) {
cout << "hello" << endl; }
}
class Dog: public Pet {
public:
void talk() {
cout << "woof" << endl; }
}
...
Pet *p; Dog dog; p = &dog;
p->talk( );
31
So now you know...

In C++, method names are statically bound to code,
unless "virtual" is specified.

In Java, all methods are dynamically bound to actual
code, except in these cases...
"private" methods are statically bound
"static" methods are statically bound
"final" methods are statically bound
32
Variables and Constants
A variable is a name for a memory location, its value
can change during execution.
 A constant is an object whose value does not change
throughout its lifetime.
 Literals are data values (no names) used in a program.
int buffer[80]; 80 is a numeric literal.


Constants may be:
 substituted for values by compiler (never allocated)
 compile-time static (compiler can set value)
 load-time static (value determined at load time)
 dynamic (value determined at run time)
33
Binding of Constants

C "const" can be compile time, load time, or run time
constants:
const int MaxSize = 80;
/* compile time */
void mysub( const int n ) {
const time_t now = time(0); /* load time */
const int LastN = n;
/* dynamic */

In Java, "final" means a variable cannot be changed
after the first assignment. Otherwise, same as var.
static final int MAX = 1000; /* class loadtime */
void mysub ( int n ) {
final int LastN = n;
/* runtime */
34
Constants (2)

Compile-time constant in Java:
static final int zero = 0;

Load-time constant in Java:
static final Date now = new Date();

Dynamic constant in Java:
any non-static final variable.

Java "final" identifiers are variables with a restriction
(no reassignment).

C "const" is more strict: compiler has the option to
eliminate them during compilation.
35