Formalizing the Dynamic Semantics of Java

Download Report

Transcript Formalizing the Dynamic Semantics of Java

3
Variables and Storage
 A simple storage model.
 Simple and composite variables.
 Copy semantics vs reference semantics.
 Lifetime.
 Pointers.
© 2004, D.A. Watt, University of Glasgow
3-1
Variables and Storage (cont’d)
 Commands.
 Expressions with side effects.
 Implementation notes.
3-2
An abstract model of storage (1)
 In functional/logic PLs (as in mathematics), a “variable”
stands for a fixed but unknown value.
 In imperative/OO PLs, a variable is a container for a
value, which may be inspected and updated as often as
desired.
 Such a variable can be used to model a real-world object
whose state changes over time.
3-3
An abstract model of storage (2)
 To understand such variables,
assume a simple abstract model of
storage:
• A store is a collection of storage cells.
Each storage cell has a unique address.
• Each storage cell is either allocated or
unallocated.
unallocated
cells
7
true
3.14
allocated
cells
?
undefined
• Each allocated storage cell contains
either a simple value or undefined.
‘X’
3-4
Simple vs composite variables
 A simple value is one that can be stored in a single storage
cell (typically a primitive value or a pointer).
 A simple variable occupies a single allocated storage cell.
 A composite variable occupies a group of allocated
storage cells.
3-5
Simple variables
 When a simple variable is declared, a storage cell is
allocated for it.
 Assignment to the simple variable updates that storage cell.
 At the end of the block, that storage cell is deallocated.
 Animation (Ada):
declare
n: Integer;
begin
n := 0;
n := n+1;
end;
n
10?
3-6
Composite variables (1)
 A variable of a composite type has the same structure as a
value of that type. For instance:
• A record variable is a tuple of component variables.
• An array variable is a mapping from an index range to a group of
component variables.
 The component variables can be inspected and updated
either totally or selectively.
3-7
Composite variables (2)
 Animation (Ada):
declare
type Date is
record
y: Year_Number;
m: Month;
d: Day_Number;
end record;
xmas, today: Date;
begin
xmas.d := 25;
xmas.m := dec;
xmas.y := 2004;
today := xmas;
end;
xmas today
today
xmas
xmas
today
2004
?2004
2004
??
??
decdec
?dec
?
??
2525
? 25
?
?
3-8
Total vs selective update
 Total update of a composite variable means updating it
with a new (composite) value in a single step:
today := xmas;
 Selective update of a composite variable means updating a
single component:
today.y := 2004;
3-9
Static vs dynamic vs flexible arrays
 A static array is an array variable whose index range is
fixed by the program code.
 A dynamic array is an array variable whose index range is
fixed at the time when the array variable is created.
• In Ada, the definition of an array type must fix the index type, but
need not fix the index range. Only when an array variable is
created must its index range be fixed. Ada arrays are therefore
dynamic.
 A flexible array is an array variable whose index range is
not fixed at all, but may change whenever a new array
value is assigned.
3-10
Example: C static arrays
 Array variable declarations:
index range
is {0, …, 3}
float v1[] = {2.0, 3.0, 5.0, 7.0};
float v2[10];
index range is {0, …, 9}
 Function:
void print_vector (float v[], int n) {
// Print the array v[0], …, v[n-1] in the form “[… … …]”.
int i;
printf("[%f8.2", v[0]);
A C array
for (i = 1; i < n; i++)
doesn’t know
printf(" %f8.2", v[i]);
its own length!
printf("]");
}
…
print_vector(v1, 4);
print_vector(v2, 10);
3-11
Example: Ada dynamic arrays
 Array type and variable declarations:
type Vector is
array (Integer range <>) of Float;
v1: Vector(1 .. 4) := (1.0, 0.5, 5.0, 3.5);
v2: Vector(0 .. m) := (0 .. m => 0.0);
 Procedure:
procedure print_vector (v: in Vector) is
-- Print the array v in the form “[… … …]”.
begin
put('['); put(v(v'first));
for i in v'first + 1 .. v'last loop
put(' '); put(v(i));
end loop;
put(']');
end;
…
print_vector(v1); print_vector(v2);
3-12
Example: Java flexible arrays
 Array variable declarations:
float[] v1 = {1.0, 0.5, 5.0, 3.5};
float[] v2 = {0.0, 0.0, 0.0};
…
v1 = v2;
index range
is {0, …, 3}
index range is {0, …, 2}
v1’s index range is now {0, …, 2}
 Method:
static void printVector (float[] v) {
// Print the array v in the form “[… … …]”.
System.out.print("[" + v[0]);
for (int i = 1; i < v.length; i++)
System.out.print(" " + v[i]);
System.out.print("]");
}
…
printVector(v1);
printVector(v2);
3-13
Copy semantics vs reference semantics
 What exactly happens when a composite value is assigned
to a variable of the same type?
 Copy semantics: All components of the composite value
are copied into the corresponding components of the
composite variable.
 Reference semantics: The composite variable is made to
contain a pointer (or reference) to the composite value.
 C and Ada adopt copy semantics.
 Java adopts copy semantics for primitive values, but
reference semantics for objects.
3-14
Example: Ada copy semantics (1)
 Declarations:
type Date is
record
y: Year_Number;
m: Month;
d: Day_Number;
end record;
dateA: Date := (2004, jan, 1);
dateB: Date;
 Effect of copy semantics:
dateB := dateA;
dateB.y := 2005;
dateA dateB
2004
2005
2004
?
jan
jan
?
1?
1
3-15
Example: Java reference semantics (1)
 Declarations:
class Date {
int y, m, d;
public Date (int y, int m, int d) { … }
}
Date dateR = new Date(2004, 1, 1);
Date dateS = new Date(2004, 12, 25);
 Effect of reference semantics:
dateS = dateR;
dateR.y = 2005;
dateR dateS
2005
2004
1
1
2004
12
25
3-16
Example: Ada copy semantics (2)
 We can achieve the effect of reference semantics in Ada by
using explicit pointers:
type Date_Pointer is access Date;
Date_Pointer dateP = new Date;
Date_Pointer dateQ = new Date;
…
dateP.all := dateA;
dateQ := dateP;
3-17
Example: Java reference semantics (2)
 We can achieve the effect of copy semantics in Java by
cloning:
Date dateR = new Date(2004, 4, 1);
dateT = dateR.clone();
3-18
Lifetime (1)
 Every variable is created (or allocated) at some definite
time, and destroyed (or deallocated) at some later time
when it is no longer needed.
 A variable’s lifetime is the interval between its creation
and destruction.
 A variable occupies storage cells only during its lifetime.
When the variable is destroyed, the storage cells that it
occupied may be deallocated (and subsequently allocated
for some other purpose).
3-19
Lifetime (2)
 A global variable’s lifetime is the program’s run-time. It is
created by a global declaration.
 A local variable’s lifetime is an activation of a block. It is
created by a declaration within that block, and destroyed
on exit from that block.
 A heap variable’s lifetime is arbitrary, but bounded by the
program’s run-time. It can be created at any time, by a
command or expression, and may be destroyed at any later
time. It is accessed through a pointer.
3-20
Example: Ada global and local variables (1)
 Outline of Ada program:
procedure main is
g1: Integer; g2: Float;
begin … P; … Q; … end;
procedure P is
p1: Float; p2: Integer;
begin … Q; … end;
procedure Q is
q: Integer;
begin … end;
3-21
Example: Ada global and local variables (2)
 Lifetimes of global and local variables:
call
start P
call
Q
return return
from Q from P
call
Q
return
from Q stop
lifetime of g1, g2
lifetime of p1, p2
lifetime of q
lifetime of q
time
 Global and local variables’ lifetimes are nested.
3-22
Example: Ada local variables
of recursive procedure (1)
 Outline of Ada program:
procedure main is
g: Integer;
begin
… R; …
end;
procedure R is
r: Integer;
begin
… R; …
end;
3-23
Example: Ada local variables
of recursive procedure (2)
 Lifetimes of global and local variables
(assuming 3-deep recursive activation of R):
start
call
R
call
R
call
R
return return return
from R from R from R stop
lifetime of g
lifetime of r
lifetime of r
lifetime of r
time
3-24
Example: Ada heap variables (1)
 Outline of Ada program:
procedure main is
type IntNode;
type IntList is access IntNode;
type IntNode is record
elem: Integer;
succ: IntList;
end record;
odds, primes: IntList := null;
function cons (h: Integer; t: IntList)
return IntList is
begin
return new IntNode'(h, t);
end;
3-25
Example: Ada heap variables (2)
 Outline of Ada program (continued):
procedure A is
begin
odds := cons(3, cons(5, cons(7, null)));
primes := cons(2, odds);
end;
procedure B is
begin
odds.succ := odds.succ.succ;
end;
begin
… A; … B; …
end;
3-26
Example: Ada heap variables (3)
 After call and return from A:
primes
2
3
5
odds
heap
variables
 After call and return from B:
primes
2
7
3
5
7
odds
unreachable
3-27
Example: Ada heap variables (4)
 Lifetimes of global and heap variables:
start
return
from A
call A
return
call B from B
stop
lifetime of primes
lifetime of odds
lifetime of 7-node
lifetime of 5-node
lifetime of 3-node
lifetime of 2-node
time
 Heap variables’ lifetimes have no particular pattern.
3-28
Allocators and deallocators
 An allocator is an operation that creates a heap variable,
yielding a pointer to that heap variable.
• Ada and Java’s allocator is an expression of the form “new …”.
• C’s allocator is a library function, malloc.
 A deallocator is an operation that explicitly destroys a
designated heap variable.
• Ada’s deallocator is a library (generic) procedure,
unchecked_deallocation.
• C’s deallocator is a library function, free.
• Java has no deallocator at all.
3-29
Reachability
 A heap variable remains reachable as long as it can be
accessed by following pointers from a global or local
variable.
 A heap variable’s lifetime extends from its creation until:
• it is destroyed by a deallocator, or
• it becomes unreachable, or
• the program stops.
3-30
Pointers (1)
 A pointer is a reference to a particular variable. (In fact,
pointers are sometimes called references.)
 A pointer’s referent is the variable to which it refers.
 A null pointer is a special pointer value that has no
referent.
 In terms of our abstract model of storage, a pointer is
essentially the address of its referent in the store. However,
each pointer also has a type, and the type of a pointer
allows us to infer the type of its referent.
3-31
Pointers (2)
 Pointers and heap variables can be used to represent
recursive values such as lists and trees.
 But the pointer itself is a low-level concept. Manipulation
of pointers is notoriously error-prone and hard to
understand.
 For example, the assignment “p.succ := q;” appears to
manipulate a list, but which list? Also:
• Does it delete nodes from the list?
• Does it stitch together parts of two different lists?
• Does it introduce a cycle?
3-32
Dangling pointers (1)
 A dangling pointer is a pointer to a variable that has been
destroyed.
 Dangling pointers arise from the following situations:
• where a pointer to a heap variable still exists after the heap variable
is destroyed by a deallocator
• where a pointer to a local variable still exists at exit from the block
in which the local variable was declared.
 A deallocator immediately destroys a heap variable; all
existing pointers to that heap variable then become
dangling pointers. Thus deallocators are inherently unsafe.
3-33
Dangling pointers (2)
 C is highly unsafe:
• After a heap variable is destroyed, pointers to it might still exist.
• At exit from a block, pointers to its local variables might still exist
(e.g., stored in global variables).
 Ada is safer:
• After a heap variable is destroyed, pointers to it might still exist.
• But pointers to local variables may not be stored in global
variables.
 Java is very safe:
• It has no deallocator.
• Pointers to local variables cannot be obtained.
3-34
Example: C dangling pointers
 Consider this C code:
struct Date {int y, m, d;};
allocates a new
heap variable
Date* dateP; Date* dateQ;
dateP = (Date*)malloc(sizeof Date);
dateP->y = 2004; dateP->m = 1; dateP->d = 1;
dateQ = dateP;
makes dateQ point
to the same heap
variable as dateP
free(dateQ);
printf("%d4", dateP->y);
dateP->y = 2005;
fails
deallocates that heap
variable (dateP
and dateQ are now
dangling pointers)
fails
3-35
Commands
 A command (or statement) is a PL construct that will be
executed to update variables.
 Commands are characteristic of imperative and OO (but
not functional) PLs.
 Forms of commands:
• skips
• assignments
• procedure calls
• sequential commands
• conditional commands
• iterative commands.
3-36
Skips
 A skip is a command with no effect.
 Typical forms:
• “;” in C and Java
• “null;” in Ada.
 Skips are useful mainly within conditional commands.
3-37
Assignments
 An assignment stores a value in a variable.
 Single assignment:
• “V = E;” in C and Java
• “V := E;” in Ada
– the value of expression E is stored in variable V.
 Multiple assignment:
• “V1 =  = Vn = E;” in C and Java
– the value of E is stored in each of V1, , Vn.
 Assignment combined with binary operator:
• “V = E;” in C and Java means the same as “V = V  E;”.
3-38
Procedure calls
 A procedure call achieves its effect by applying a
procedure to some arguments.
 Typical form:
P(E1, , En);
Here P determines the procedure to be applied, and E1, ,
En are evaluated to determine the arguments. Each
argument may be either a value or (sometimes) a reference
to a variable.
 The net effect of the procedure call is to update variables.
The procedure achieves this effect by updating variables
passed by reference, and/or by updating global variables.
(But updating its local variables has no net effect.)
3-39
Sequential commands
 Sequential, conditional, and iterative commands (found in
all imperative/OO PLs) are ways of composing commands
to achieve different control flows. Control flow matters
because commands update variables, so the order in which
they are executed makes a difference.
 A sequential command specifies that two (or more)
commands are to be executed in sequence. Typical form:
C1 C2
– command C1 is executed before command C2.
3-40
Conditional commands
 A conditional command chooses one of its subcommands
to execute, depending on a condition.
 An if-command chooses from two subcommands, using a
boolean condition.
 A case-command chooses from several subcommands.
3-41
If-commands (1)
 Typical forms (Ada and C/Java, respectively):
if E then
C1
else
C2
end if;
if (E)
C1
else
C2
E must be of
type Boolean
– if E yields true, C1 is executed; otherwise C2 is executed.
 Common abbreviation (Ada):
if E then
C1
end if;

if E then
C1
else
null;
end if;
3-42
If-commands (2)
 Generalisation to multiple conditions (in Ada):
if E1 then
C1
elsif E2 then
C2
…
elsif En then
Cn
else
C0
end if;
E1, …, En must be
of type Boolean
– if E1, …, Ei-1 all yield false but Ei yields true, then Ci is
executed; otherwise C0 is executed.
3-43
Case-commands (1)
 In Ada:
case E is
when v1 =>
C1
…
when vn =>
Cn
when others =>
C0
end case;
E must be of some primitive
type other than Float
v1, …, vn must be distinct
values of that type
– if the value of E equals some vi, then Ci is executed;
otherwise C0 is executed.
3-44
Case-commands (2)
 In C and Java:
switch (E) {
case v1:
C1
…
case vn:
Cn
default:
C0
}
E must be of integer type
v1, …, vn must be integers,
not necessarily distinct
– if the value of E equals some vi, then Ci, …, Cn, C0 are
all executed; otherwise only C0 is executed.
3-45
Example: Ada case-command
 Code:
today: Date;

case today.m is
when jan => put("JAN");
when feb => put("FEB");

when nov => put("NOV");
when dec => put("DEC");
end case;
3-46
Example: Java switch-command
 Code:
Date today;

switch (today.m) {
case 1: System.out.print("JAN"); break;
case 2: System.out.print("FEB"); break;

case 11: System.out.print("NOV"); break;
case 12: System.out.print("DEC");
}
breaks
are
essential
3-47
Iterative commands
 An iterative command (or loop) repeatedly executes a
subcommand, which is called the loop body.
 Each execution of the loop body is called an iteration.
 Classification of iterative commands:
• Indefinite iteration: the number of iterations is not predetermined.
• Definite iteration: the number of iterations is predetermined.
3-48
Indefinite iteration (1)
 Indefinite iteration is most commonly supported by the
while-command. Typical forms (Ada and C/Java):
while E loop
C
end loop;
while (E)
C
 Meaning (defined recursively):
while E loop
C
end loop;

if E then
C
while E loop
C
end loop;
end if;
3-49
Indefinite iteration (2)
 Indefinite iteration is also supported in some PLs by the
do-while-command. Typical form (C/Java):
do
C
while (E);
 Meaning:
do
C
while (E);

C
if (E) {
do
C
while (E);
}

C
while (E)
C
3-50
Definite iteration (1)
 Definite iteration is characterized by a control sequence, a
predetermined sequence of values that are successively
assigned (or bound) to a control variable.
 Ada for-command:
for V in R loop
C
end loop;
R must be of some primitive
type other than Float
– the control sequence consists of all values in the range R,
in ascending order.
3-51
Definite iteration (2)
 Java 1.5’s new-style for-command can iterate over an
array, list, or set:
for (T V : E)
C
– the control sequence consists of all component values of
the array/list/set yielded by E.
 NB: Java’s old-style for-command is just an abbreviation
for a while-command (indefinite iteration):
for (C1; E; C2)
C3

C1
while (E)
{ C3 C2 }
3-52
Example: definite iteration over arrays
 In Ada:
dates: array (…) of Date;
…
for i in dates'range loop
put(dates(i));
end loop;
 In Java:
Date[] dates;
…
for (int i = 0; i < dates.length; i++)
System.out.println(dates[i]);
oldstyle
for (Date dat : dates)
System.out.println(dat);
newstyle
3-53
Expressions with side effects
 The primary purpose of evaluating an expression is to
yield a value.
 But in many imperative/OO PLs, evaluating an expression
can also update variables – side effects.
 In Ada, C, and Java, the body of a function is a command.
If that command updates global variables, calling the
function has side effects.
 In C and Java, assignments are in fact expressions with
side effects: “V = E” stores the value of E in V as well as
yielding that value. Similarly “V = E”.
3-54
Example: side effects in C
 The C function getchar(f) reads a character and
updates the file variable that f points to.
 The following code is correct and concise:
char ch;
while ((ch = getchar(f)) != NUL)
putchar(ch);
 The following code is incorrect (why?):
enum Gender {female, male};
Gender g;
if (getchar(f) == 'F') g = female;
else if (getchar(f) == 'M') g = male;
else 
3-55
Implementation notes
 Each variable occupies storage space throughout its
lifetime. That storage space must be allocated at the start of
the variable’s lifetime (or before), and deallocated at the
end of the variable’s lifetime (or later).
 The amount of storage space occupied by each variable
depends on its type.
 Assume that the PL is statically typed: all variables’ types
are declared explicitly, or the compiler can infer them.
3-56
Storage for global and local variables (1)
 A global variable’s lifetime is the program’s entire runtime. So the compiler can allocate a fixed storage space to
each global variable.
 A local variable’s lifetime is an activation of the block in
which the variable is declared. The lifetimes of local
variables are nested. So the compiler allocates storage
space to local variables on a stack.
3-57
Storage for global and local variables (2)
 At any given time, the stack contains several activation
frames.
 Each activation frame contains
enough space for the local variables
of a particular procedure.
housekeeping
data
local variables
 An activation frame is:
• pushed on to the stack when a procedure is called
• popped off the stack when the procedure returns.
 Storage can be allocated to local variables of recursive
procedures in exactly the same way.
3-58
Example: storage for
global and local variables (1)
 Outline of Ada program:
procedure main is
g1: Integer; g2: Float;
begin … P; … Q; … end;
procedure P is
p1: Float; p2: Integer;
begin … Q; … end;
procedure Q is
q: Integer;
begin … end;
3-59
Example: storage for
global and local variables (2)
 Storage layout as the program runs:
call P
call Q
return
from Q
g1
g1
g1
g1
g1
g1
g2
g2
g2
g2
g2
g2
p1
p1
p1
p2
p2
p2
q
return
from P
call Q
q
3-60
Storage for heap variables (1)
 A heap variable’s lifetime starts when the heap variable is
created and ends when it is destroyed or becomes
unreachable. There is no pattern in their lifetimes.
 Heap variables occupy a storage region called the heap. At
any given time, the heap contains all currently-live heap
variables, interspersed with unallocated storage space.
• When a new heap variable is to be created, some unallocated
storage space is allocated to it.
• When a heap variable is to be destroyed, its storage space reverts
to being unallocated.
3-61
Storage for heap variables (2)
 A heap manager (part of the run-time system) keeps track
of allocated and unallocated storage space.
 If the programming language has no explicit deallocator,
the heap manager must be able to find any unreachable
heap variables. (Otherwise heap storage will eventually be
exhausted.) This is called garbage collection.
 A garbage collector must visit all heap variables in order to
find the unreachable ones. This is time-consuming.
 But garbage collection eliminates some common errors:
• omitting to destroy unreachable heap variables
• destroying heap variables that are still reachable.
3-62
Example: storage for heap variables (1)
 Consider the Ada program on slides 3-24 and 3-25.
 Storage layout as the program runs:
3-63
Example: storage for heap variables (2)
call and return
from A
call and return
from B
collect
garbage
odds
primes
heap
(initially
unallocated)
2
2
2
3
3
3
5
5
7
7
7
3-64
Mark-scan garbage collection algorithm
 To collect garbage:
1. For each variable v in the heap:
1.1. Mark v as unreachable.
2. For each pointer p in the stack:
2.1. Scan all variables that can be reached from p.
3. For each variable v in the heap:
3.1. If v is marked as unreachable:
3.1.1. Deallocate v.
 To scan all variables that can be reached from p:
1. Let variable v be the referent of p.
2. If v is marked as unreachable:
2.1. Mark v as reachable.
2.2. For each pointer q in v:
2.2.1. Scan all variables that can be reached from q.
3-65
Representation of
dynamic/flexible arrays (1)
 The array indexing operation will behave unpredictably if
the index value is out-of-range. To avoid this, in general,
we need a run-time range check on the index value.
 A static array’s index range is known at compile-time. So
the compiler can easily generate object code to perform the
necessary range check.
 However, a dynamic/flexible array’s index range is known
only at run-time. So it must be stored as part of the array’s
representation:
• If the lower bound is fixed, only the length need be stored.
• Otherwise, both lower and upper bounds must be stored.
3-66
Representation of
dynamic/flexible arrays (2)
 Example (Ada):
type Vector is
array (Integer range <>) of Float;
v1: Vector(1 .. 4);
v2: Vector(0 .. 2);
lower
upper
v1
1
4
upper
v2
0
2
1
1.0
0
0.0
2
0.5
1
0.0
3
5.0
2
0.0
4
3.5
lower
3-67
Representation of
dynamic/flexible arrays (3)
 Example (Java):
float[] v1 = new float[4];
float[] v2 = new float[3];
v1
tag float[]
length
4
v2
tag float[]
length
3
0
1.0
0
0.0
1
0.5
1
0.0
2
5.0
2
0.0
3
3.5
3-68