Transcript here

Languages and Compilers
(SProg og Oversættere)
Bent Thomsen
Department of Computer Science
Aalborg University
With acknowledgement to Elsa Gunter who’s slides this lecture is based on .
1
Type Checking
• When is op(arg1,…,argn) allowed?
• Type checking assures that operations are
applied to the right number of arguments
of the right types
– Right type may mean same type as was
specified, or may mean that there is a
predefined implicit coercion that will be
applied
• Used to resolve overloaded operations
2
Type Checking
• Type checking may be done statically
at compile time or dynamically at run
time
• Untyped languages (eg LISP, Prolog)
do only dynamic type checking
• Typed languages can do most type
checking statically
3
Dynamic Type Checking
• Performed at run-time before each
operation is applied
• Types of variables and operations left
unspecified until run-time
– Same variable may be used at different
types
4
Static Type Checking
• Performed after parsing, before code
generation
• Type of every variable and signature
of every operator must be known at
compile time
5
Static Type Checking
• Can eliminate need to store type
information in data object if no
dynamic type checking is needed
• Catches many programming errors at
earliest point
6
Strongly Typed Language
• When no application of an operator
to arguments can lead to a run-time
type error, language is strongly typed
• Depends on definition of “type”
7
Strongly Typed Language
• C is “strongly typed” but type
coercions may cause unexpected
(undesirable) effects; no array
bounds check (in fact, no runtime
checks at all)
• SML “strongly typed” but still must do
dynamic array bounds checks,
arithmetic overflow checks
8
How to Handle Type Mismatches
• Type checking to refuse them
• Apply implicit function to change
type of data
–Coerce int into real
–Coerce char into int
9
Conversion Between Types:
• Explicit: all conversions between
different types must be specified
• Implicit: some conversions between
different types implied by language
definition
– Implicit conversions called coercions
10
Coercion Examples
Example in Pascal:
var A: real;
B: integer;
A := B
–Implicit coercion - an automatic
conversion from one type to
another
11
Coercions Versus Conversions
• When A has type int and B has type real,
many languages allow coercion implicit in
A := B
• In the other direction, often no coercion
allowed; must use explicit conversion:
– A := round(B); Go to integer nearest B
– A := trunc(B); Delete fractional part of B
12
Type Equality (aka Type Compatibility)
• When are two types “the same”?
• Name equivalence: two types equal
only if they have the same name
– Simple but restrictive
– Usually loosened to allow two types to be
equal when one is defined with the name
of the other (declaration equivalence)
13
Type Equality
• Structure equivalence: Two types
are equivalent if the underlying
data structures for each type are
the same
–Problem: how far to go – are two
records with the same number of
fields of same type, but different
labels equivalent?
14
Elementary Data Types
• Data objects contain single data
value with no components
• Standard elementary types include:
integers, reals, characters,
booleans, enumerations, pointers
(references in SML)
15
Specification of Elementary Data Types
• Basic attributes of type usually used by
compiler and then discarded
• Some partial type information may occur
in data object
• Values usually match with hardware
types: 8 bits, 16 bits, 32 bits, 64 bits
• Operations: primitive operations with
hardware support, and user-defined
operations built from primitive ones
16
Integers – Specification
• Range of integers for some fixed
minint to some fixed maxint, typically
-2^31 through 2^31 – 1 or –2^30
through 2^30 - 1
• Standard collection of operators:
+, -, *, /, mod, ~ (negation)
• Standard relational operations:
=, <, >, <=, >=, =/=
17
Integers - Implementation
• Implementation:
– Binary representation in 2’s
complement arithmetic
– Three different standard
representations:
S
Sign bit (0 for +, 1 for -)
Data
Binary integer
18
Integers - Implementation
• First kind:
S
Data
Sign bit (0 for +, 1 for -) Binary integer
19
Integers – Implementation
• Second kind
T
Address
Type descriptor
• Third kind
S
Data
Sign bit
T S Data
Type descriptor Sign bit
20
Integer Numeric Data
• Positive values
0 1 0 0 1 1 0 0
64 + 8 + 4
= 76
sign bit
21
Subranges
• Example (Ada):
A:integer range 10..20
• Subtype of integers (implicit
coercion into integer)
22
Subranges
• Data may require fewer bits than
integer type
–Data in example above require
only 4 bits
• Range checking usually requires
some runtime time information and
dynamic type checking
23
IEEE Floating Point Format
• IEEE standard 754 specifies both a
32- and 64-bit standard
• At least one supported by most
hardware
• Numbers consist of three fields:
– S (sign), E (exponent), M (mantissa)
S
E
M
24
Floating Point Numbers: Theory
• Every non-zero number may be
uniquely written as
S
e
(-1) * 2 * m
where 1  m < 2 and S is either 0 or 1
25
Floating Point Numbers: Theory
• Every non-zero number may be
uniquely written as
S
(E
–
bias)
(-1) * 2
* (1 + (M/2N))
where 0  M < 1
• N is number of bits for M (23 or 52)
• Bias is 127 of 32-bit ints
• Bias is 1023 for 64-bit ints
26
IEEE Floating Point Format (32 Bits)
• S: a one-bit sign field. 0 is positive.
• E: an exponent in excess-127
notation. Values (8 bits) range from 0
to 255, corresponding to exponents
of 2 that range from -127 to 128.
27
IEEE Floating Point Format (32 Bits)
• M: a mantissa of 23 bits. Since the
first bit of the mantissa in a
normalized number is always 1, it
can be omitted and inserted
automatically by the hardware,
yielding an extra 24th bit of precision.
28
Exponent Bias
• If 8 bits (256 values) +127 added to
exponent to get E
• If E = 127 then 127-127 = 0 is true
exponent
• If E = 129 then 129-127 = 2 is true
exponent
• If E = 120 then 120-127 = -7 is true
exponent
29
Floating Point Number Range
• In 32-bit format, the exponent has 8
bits giving a range from –127 to 128
for exponent
• This give a number range from 10-38
38
to 10 roughly speaking
30
Floating Point Number Range
• In 64-bit format,the exponent is
extended to 11 bits giving a range
from -1023 to +1024 for the
exponent
• This gives a range from 10-308 to
10308 roughly speaking
31
Decoding IEEE format
• Given E, and M, the value of the
representation is:
Parameters
Value
• E=255 and M  0 An invalid number
• E=255 and M = 0 
• 0<E<255
2{E-127}(1+(M/ 223))
• E=0 and M  0
2 -126 (M / 223)
• E=0 and M=0
0
32
Example Floating Point Numbers
0
2 *1=
{127-127}
2
*(1
• +1=
+ .0)
0 01111111 000000…
• +1.5= 20*1.5= 2{127-127}*(1+ 222/
223)
0 01111111 100000…
• -5= -22*1.25= 2{129-127}*(1+ 221/
223)
1 10000001 010000…
33
Other Numeric Data
• Short integers (C) - 16 bit, 8 bit
• Long integers (C) - 64 bit
• Boolean or logical - 1 bit with value
true or false (often stored as bytes)
• Byte - 8 bits
34
Other Numeric Data
• Character - Single 8-bit byte - 256
characters
• ASCII is a 7 bit 128 character code
• Unicode is a 16-bit character code
(Java)
• In C, a char variable is simply 8-bit
integer numeric data
35
Enumerations
• Motivation: Type for case analysis over a
small number of symbolic values
• Example: (Ada)
Type DAYS is {Mon, Tues, Wed, Thu, Fri,
Sat, Sun}
• Implementation: Mon  0; … Sun  6
• Treated as ordered type (Mon < Wed)
• In C, always implicitly coerced to integers
36
Pointers
• A pointer type is a type in which the range
of values consists of memory addresses
and a special value, nil (or null)
• Use of pointers to create arbitrary
data structures
37
Pointer Data
• Each pointer can point to an object of
another data structure
– Its l-value is its address; its r-value is
the address of another object
• Accessing r-value of r-value of
pointer called dereferencing
38
Pointer Aliasing
• A:= B
– Numeric assignment
A:
B:
A:
B:
7.2
A: 0.4
B: 0.4
0.4
– Pointer assignment
7.2
0.4
A:
B:
0.4
39
Problems with Pointers
• Dangling Pointer
A:
B:
Delete A
A:
B:
A:
B:
0.4
• Garbage (lost heap-dynamic variables)
7.2
0.4
7.2
0.4
40
Ways to Create Dangling Pointers
int * A, B;
A = new int;
A = 5;
B = A;
delete A;
/* B is still pointing to the address of
object A returned to stack */
41
Ways to Create Dangling Pointers
int * A;
int * sub () { int B;
B = 5;
return B;}
main () { A = sub(); . . . }
/* A has been assigned the address of
an object that is out of scope */
42
SML references
• An alternative to allowing pointers directly
• References in SML can be typed
• … but they introduce some abnormalities
43
SML imperative constructs
• SML reference cells
– Different types for location and contents
x : int
y : int ref
!y
ref x
non-assignable integer value
location whose contents must be integer
the contents of location y
expression creating new cell initialized to x
– SML assignment
operator := applied to memory cell and new contents
– Examples
y := x+3 place value of x+3 in cell y; requires x:int
y := !y + 3 add 3 to contents of y and store in location y
44
SML examples
• Create cell and change contents
val x = ref “Bob”;
x := “Bill”;
• Create cell and increment
val y = ref 0;
y := !y + 1;
• While loop
val i = ref 0;
while !i < 10 do i := !i +1;
!i;
45
Composite Data Types
• Composite data types are sets of
data objects built from data objects of
other types
• Elements called data structures
• Some created by users, eg an array
of integers
• Some created internally by compiler,
eg symbol table, or subroutine
activation record
46
Specification of Structured Data Types
• Number of components
– Fixed or varying over life of data
structure
• Arrays and records have fixed
number
• Lists have variable number
– If variable number of components, is
there a max number possible
47
Specification of Structured Data Types
• Type of each component
–Homogeneous: all components
have same type
• Arrays
–Heterogeneous: components have
varying types
• Records (also lists in some
languages, but not SML)
48
Specification of Structured Data Types
• Method of accessing components
–Array subscripting
–Record labels
–SML datatype pattern matching
49
Operations on Data Structures
• Creation and deletion of
structures
• Whole-structure operations
–Assigning to variable
–Iterating a function over the
structure
–Computing its length or size
50
Operations on Data Structures
• Component selection operations
– Direct access (aka random selection)
• Takes constant time
– Sequential selection
• Usually proportional to some
dimension of the structure (like the
number of components)
– May allow component update, or may
only allow access to value
51
Operations on Data Structures
• Component insertion and deletion
– Applies to structures with variable
number of components
– Causes major effects on possible data
layouts
• Example seen in the layouts for
strings
52
General Layout of Data Structures
• Descriptor
– Contains type information and other
attributes of data structure
– May only exist in symbol table at
compile time, or may be a direct part of
data object, or split between two
– Usually several words long
53
General Layout of Data Structures
• Layout of component data
–Sequential: arrays and records
• Uses least storage for structure if
number of components fixed
• Least flexible for overall storage
management
54
General Layout of Data Structures
• Layout of component data
–Linked: lists, trees
• Uses more space per structure
since each component must also
have a pointer to it
• Maximum flexibility for overall
storage management, put pieces
where they fit
55
Strings
• Character string is a data object
composed of a sequence of
characters
• Main kinds:
– Fixed declared length
– Variable length with declared maximum
length
– Unbounded length
56
String operations
•
•
•
•
String concatenation
Length of string
Substring selection by position
Lexicographical ordering (based on
underlying codes such as ASCII)
• Substring by pattern matching
57
String Interface
• Can be implemented as primitive
type (as in SML or Java) or an array
of characters (as in C and C++)
• If primitive, operations are built in
• If array of characters, string
operations provided through a library
58
String Implementations
• Fixed declared length (aka static
length)
–Packed array padded with blanks
Descriptor
String
Length=12
Pointer to data
Data
A l l •
a b o a
r d ø ø
59
String Implementations
• May need runtime descriptor
for type, and length is
substring operations include
runtime checks
• Update pads with blanks or
truncates as necessary
60
String Implementations
• Variable length with declared
maximum (aka limited dynamic
length)
– Packed array with runtime descriptor
String
Max Length=12
Cur Length=10
Pointer to data
A l l •
a b o a
r d
61
String Implementations
• Descriptor may occur as initial
block of data object for array
62
String Implementations
• Unbounded length (aka dynamic length)
– Two standard implementations
– First: Linked list
String
Curr Length = 10
Pointer to data
a b
o a
A
l
r
d
l •
63
String Implementations
• Unbounded length
– Second implementation: null terminated
contiguous array
String
Pointer to data
A l
l
•a b o a r d
– Must reallocate and copy when string
grows
64
Arrays
• Ordered sequence of fixed number of
objects all of the same type
• Indexed by integer, subrange, or
enumeration type, called subscript
• Multidimensional arrays have one
subscript per each dimension
• L-value for array element given by
accessing formula
65
Type Checking Arrays
•
•
•
•
•
Basic type – array
Number of dimensions
Type of components
Type of subscript
Range of subscript (must be done at
runtime, if at all)
66
Array Layout
• Assume one dimension
1 dim array
Virtual Origin (VO)
Lower Bound (LB)
A[0]

A[LB]
A[LB+1]
Upper Bound (UB)
Comp type
Comp size (E)
A[UB]
67
Array Component Access
• Component access through
subscripting, both for lookup (r-value)
and for update (l-value)
• Component access should take
constant time (ie. looking up the 5th
element takes same time as looking
up 100th element)
68
Array Access Function
• L-value of A[i] = VO + (E * i)
=  + (E * (i – LB))
• Computed at compile time
• VO =  - (E * LB)
• More complicated for multiple
dimensions
69
Records
• Ordered sequence of fixed number of
objects of differing types
• Indexed by fixed identifiers called
labels or fields
• L-value for record element given by
more complex accessing formula
than for arrays
70
Typical Record Layout
Descriptor
Record type
Num. of components
Comp 1 label
Comp 1 type
Comp 1 location = 
Comp n label
Comp n type
Comp n location
Data
R.1
R.2
R.n
71
Type Checking Record
• Basic type – record
• Number, name (label) of
components
• Possibly order of labels
– If order matters, labels must be
unique
– If order doesn’t matter, layout must
give a canonical ordering
• Type of components per label
72
Record Layout
• Most of descriptor exists only at compile
time
• Access function:
• Comp i location given by
i-1
• L-value of R.i =  +  (size of R.j)
j=1
73
Lists
• Ordered collection of variable
number of elements
–Many languages (LISP, Scheme,
Prolog) allow heterogeneous list
–SML has only homogeneous lists
74
Lists
• Layout: linked series of cells
(called cons cells) with descriptor,
data and pointers
–Data in first cell of list called head
of list
–R-value of pointer in first cell called
tail of list
75
Lists
• Sequential access of data by
following pointers
–Access is linear in position in
list
• Takes twice as long to look up
10th element as to look up 5th
element
76
Lists
• Adding a new element to list
done only at head, called
consing
• Creates new cell with element
to be added and pointer to old
list (ie. creates new list)
77
List Layout
• Example: [1,2.5,’a’]
list
list
int
1
list
real
2.5
char ‘a’
78
List Layout
• Example: [[1,2.5],[’a’]]
list
list
int
1
real
list
list
char ‘a’
2.5
79
Union Types
• Set-wise the (discriminated) union of
the component types
• Interchangeable with variant records
as primitive type construct
• Elements chosen from one of
component types
80
Union Types
• Problem: if int occurs as two
different components of union
type, can we tell which
component an int is for?
81
Union Types
• Two kinds of union types:
–Free union - Ans: no
–Discriminated union – Ans: yes
• If each component is tagged to
separate occurrences of same type,
discriminated union, otherwise not
82
Union Layout
Descriptor
Union type
Data
Actual data
Component type
L
Component tag
Component location
Unused space
• No tag if free union
• L is fixed length of biggest component
83
Combining Data Structures
• Possible to have any of the
above structures as components
of others
• Since lists are of variable size,
but arrays must store fixed size
element, how to store lists in an
array?
84
Combining Data Structures
• Answer: cons cells have uniform
size, store just the leading cons
cell
85
Example:
• Data in 4-element array of lists
int
5
list
list
int
6
list
int
int
3
list
1
list
int
2
list
int
7
86
Type symmary
• Static type checking takes place after syntax
check and before code generation
• Some type checking can be necessary at run
time
• Types vs. Syntax
• Simply typed values and composite values
• User defined types
• Equivalence on types
87