CS337 Comparative Programming Languages

Transcript CS337 Comparative Programming Languages

CI346
Programming Languages
John English (W425) [email protected]
[away 2008-9]
Mike Smith (W424) [email protected]
Course content
• History
• Imperative (procedural, O-O) languages
– comparison of common features
– unique or ‘interesting’ features
• Declarative languages
– functional languages
– logic (constraint) languages
– others...
Resources
• Recommended text
– "Comparative Programming Languages",
Wilson & Clark (Addison Wesley 2001)
• BURKS 6
– http://burks.bton.ac.uk/burks
– many compilers, tutorials, manuals
– "Introduction to Programming Languages"
by Anthony A. Aaby
– Free Online Dictionary of Computing
Assessment
• Coursework
– learn two new languages (sufficient to write
a small program)
– short presentation of significant features
• Exam
– analyse some aspect of your chosen
languages (e.g. the range of data types it
provides)
Why study languages?
• Discover alternative forms of expression
– ‘if you only have a hammer, everything
looks like a nail...’
• Make it easier to learn new languages
– can relate features across languages
• Cross-fertilisation
– inventing new languages
Language goals
• Constructing reliable software
– error detection (early!), correctness
• Maintainability
– readability, modularity
• Expressive power
– reuse, extensibility, genericity
• Ease of use
– simplicity, orthogonality (e.g. you can
declare arrays and functions, but can you
return them as the result of a function?)
Implementation
• Reliable implementations
– clear language standards
– compiler bugs and conformity tests
• Interpret or compile?
– interpreting: easier, more portable
– compiling: more efficient, earlier error
detection
– a mixture of approaches is normal
Non-linguistic aspects
• Compiler quality
– error messages, code size and speed
• Debugging
– step, run, set breakpoints, inspect
variables
• Development environment
– visual design tools, syntax colouring,
prettyprinting, wizards
Language evolution
• Languages have always adopted good
ideas from their predecessors
• Languages have (almost) always learnt
from the mistakes of their predecessors
– mistakes are usually only discovered after
the language has been in use for a while
– sometimes mistakes are discovered when
someone tries to implement the bright
ideas of the language designers...
Language evolution
PL/1
Basic
Algol 60
Cobol
Fortran
Algol 68
Algol W
Simula
CPL
Modula 2
Pascal
BCPL
Oberon
Ada
B
Objective C
C
Smalltalk
C++
Java
C#
Language evolution
Lisp
Prolog
ML
Pop-2
Algol / C
Unix (sh)
Perl
PHP
Haskell
Ruby
COMIT
Forth
Snobol
PostScript
Icon
Some important ideas
• Algol 60: formal grammars, block structure,
stack-based implementation
• Algol 68: orthogonality, concurrency, operator
overloading
• Lisp: garbage collection, recursion
• CLU: generics, exception handling
• Smalltalk: OOP
• Pascal: data types, strong typing
• Modula-2: modularity
• Eiffel: design by contract
Orthogonality
• How do language features affect one
another?
– independent features are ‘orthogonal’, i.e.
at ‘right angles’ to each other
• Examples:
– arithmethic expressions and arrays
– arrays and procedure parameters
– exception handling and concurrency
Formalism
• Fortran used an English description
– ambiguous: a correct Fortran program is a
program which a particular compiler
accepts
• Algol 60 introduced formal syntax (BNF)
– compilers could use syntax rules to
determine correctness
– semantics were described in English, so
some ambiguities remained
Formalism
• Algol 68 used van Wijngaarden (‘two
level’) grammars
– could specify syntax and semantics
precisely
– could not construct automated parsers
– general opinion: elegant, but too
complicated for mere mortals
Foot-shootery
•
•
•
•
•
Fortran DO-loop
Algol 60 & Coral comments
PL/1 type conversions
C & C++ equality tests
Algol 60, C: null statements using ‘;’
Awkward design
• Dangling ELSE
– C, C++, Java, Pascal, ...
• Inconsistent syntax
– semicolons as separators in Algol, Pascal
– declarations in Pascal
• Ad-hoc restrictions
– statement layout, subscripts in Fortran
– order of declarations in Pascal
Language issues
• Program structure
– control structures
– procedures, functions, subroutines
– parameter passing
– nesting, recursion
– classes, methods, polymorphism
– concurrency
Language issues
• Variables
– lifetime and scope
– ‘referential transparency’
• Data types
– scalars, arrays, structures, unions...
– range checking
• Error handling
– throwing and catching exceptions
– abort or resume?
Program structure
• Compilation units:
– Algol 60/68, Pascal: monolithic
– Fortran: subroutine or function
– Ada: procedure, function or package
– C, C++: source file (usually one or more
functions) plus ‘include files’
– Java: class
Control structures
• In the beginning was the GOTO
statement...
– reflects the underlying assembly language
– can lead to ‘spaghetti code’
• Common requirements:
– decisions
– loops
– special language constructs invented to
support these requirements
Decisions
• Fortran: IF (n) a,b,c
– this is a 3-way GOTO statement
• Algol 60: if a > b then c else d;
– c cannot be another IF statement
– use begin ... end blocks to enclose
sequences of statements
• C, C++, Java, C#: if (a > b) c else d
– ambiguity if c is another IF statement
– use { ... } to enclose sequences of
statements
Loops
• Fortran: GOTO, or DO loop
– DO is like modern FOR loops
– ‘DO 10 I=1,100,2’
• Algol 60: for & while loops
– while loops could replace use of GOTO
– ‘for i := 1 step 1 until 10 while a < b do ...’
– floating-point loops also possible: ‘step 0.1’
– where is control variable of ‘for’ loop
declared and what is its scope?
Loops
• Snobol:
– a string-processing language
– each statement can match a string against
a pattern
– each statement can have two labels to
jump to on completion (match succeeds or
fails)
Loops
• Icon:
– ‘generators’ either succeed and produce a
result, or fail
– generators which succeed can be called
again to produce the ‘next’ result (if any)
– loops can iterate over every result of a
generator
– ‘every i := (1 to 10) | (91 to 100) do write i’
– ‘every write( (1 to 10) | (91 to 100) )’
Block structure
• Algol 60 introduced the notion of
‘blocks’
– sequence of statements (and local variable
declarations) enclosed by begin ... end
– used where more that one statement
needed inside an if or loop statement
– this is usually the case!
– C and its descendents use { ... } instead of
begin ... end
Block structure
• C, C++, Java, C#: the ‘dangling else’ problem
– if (a) if (b) then c; else d;
• Algol 60: no dangling ‘else’
– if X then Y: Y cannot be another if statement (but
can be a block which contains one)
• Algol 68, Ada: all control structures fully
bracketed (so no need to use blocks):
– Algol 68: if a then b else c fi;
– Ada: if a then b else c end if;
Procedures
• Most languages have procedures (no
result) and functions (which return a
result)
– function calls can be used in arithmetic
expressions
– Fortran called procedures ‘subroutines’
• Algol 68, C, C++, Java, C#: functions
can be used as procedures (result is
discarded)
Procedures
• Procedures and functions can have
parameters
– limitations on what types are allowed?
– limitations on what types can be returned
by function?
– limitations on what the procedure or
function can do with the parameter (e.g.
read only, read/write)
Procedures
• Are procedures and functions first-class
objects?
– e.g. integers are first-class objects: can be
stored in variables, passed as parameters,
returned as results
– can procedures be stored, passed,
returned in this way?
• A related idea: closures
– a closure is a function call together with its
current scope
Parameter passing
• By value: a copy of the parameter is
given to the function (C, C++, Java, Ada)
– changing the parameter only changes the
copy
• By reference: a reference to a variable is
given to the function (C++, Ada, Fortran)
– changing the parameter changes the original
variable
– can be done by passing a pointer by value
Parameter passing
• By value/return: a copy of the parameter
is given to the function and the updated
value is written back to the corresponding
variable at the end (Ada)
• By name: the parameter is re-evaluated
each time it is referenced (Algol 60)
– ‘Jensen’s device’: much too complex for
practical use
Nesting
• Can functions be defined inside
functions?
– Ada, Pascal, Algol: yes
– C, C++, Java: no
• Nested functions can access variables
defined in the surrounding functions
– makes life slightly more difficult for the
compiler writer
Recursion
• Can functions call themselves?
– yes, except in languages like Fortran
where variables are static and the return
address is stored in a static variable
– a recursive call in Fortran would overwrite
the previous return address
• Recursion requires dynamic variables
– each time you call the function a new set of
local variables needs to be created
Classes
• A feature of object-oriented languages
– Java, C++, Ada 95, ...
– contains data and methods (functions)
– supports inheritance
• Multiple inheritance?
– allowed in Eiffel, C++
– not allowed in Smalltalk, Ada 95
– ‘interfaces’ in Java and C#
Classes
• Polymorphism: overriding inherited
methods in a derived class
– standard behaviour in most languages
(e.g. Smalltalk, Java)
– only ‘virtual’ methods in C++
• Classes also cross the divide into data
types...
Concurrency
• How fine-grained is the concurrency?
– Ada, PL/1: task blocks (coarse grained)
– Algol 68, occam: individual statements
(fine grained)
• How do tasks synchronize their
activities?
– PL/1: no general mechanism
– Algol 68: semaphores (low-level)
Concurrency
• Higher-level synchronization
mechanisms introduced later:
– monitors (Concurrent Pascal, Ada, Java)
– rendezvous (Ada)
– channels (occam)
Variables
• Lifetime: how long does the variable
exist for?
– static (same as lifetime of program)
– dynamic (same as lifetime of block where
the declaration occurs)
• Scope: where can the variable name be
used?
– inside block where it’s declared
– anywhere, or only after declaration?
Variable lifetime
• Fortran: all variables are static
• C/C++: global and static variables are
static, all others are dynamic
• Java: static variables are static, class
variables live as long as the object they
belong to, all others are local
• Ada: variables in packages are static, all
others are dynamic
Variable scope
• Variables can usually only be accessed
from the point of declaration to the end
of the block containing the declaration
– Fortran: COMMON blocks allow several
modules to access the same variables
– C/C++: extern allows access to variables
defined externally (also C++ friends)
– Ada: packages allow variables to be
shared
Data types
• Fundamental dichotomy for elementary
data:
– numeric types
– text types
• Pascal, Ada etc. also allow user-defined
‘enumeration’ types
– type Day is (Mon, Tue, Wed, Thu, Fri, Sat,
Sun);
Data types
• Numeric types:
– integer
– real (floating point; fixed point)
– decimal or binary?
– range and precision?
• Text types:
– fixed or variable length?
– bounded or unbounded?
Data types
• Composite types:
– arrays
– records/structures
• Pointer types:
– linked lists, trees etc.
– can pointers be converted to other types?
– can pointers be manipulated?
Composite types
• Arrays:
– fixed bounds? (e.g. Ada)
– static bounds? (e.g. C)
– flexible bounds? (e.g. Algol 68)
– strings?
• Records:
– structural variants?
Composite types
• When are two composite types the
same?
– same structure? (Algol 68)
– same type name? (Ada)
– same array bounds? (Pascal)
Pointers
• Pointers allow dynamic data structures
– point anywhere (C) or just to objects which
are heap-allocated (Pascal)?
– typed (C) or untyped (PL/1)?
– interconvertible (C) or not (Ada)?
• Reclamation issues:
– manual or automatic garbage collection?
Pointers
• Algol 68 uses reference types as a
general mechanism
– an int is an integer (constant, e.g. 123)
– a ref int is a reference to an integer (an
integer variable, which can refer to a
particular int value like 123)
– a ref ref int is a reference (pointer) to a ref
int object (an int variable)...
Pointers
• Java ‘references’ are really pointers
– but they can’t be manipulated as in C/C++
• PHP has references too
– more like links in a Unix file system
• Languages like Lisp use pointers to
implement lists
– these are not accessible at the language
level
Names and types
• Names (for e.g. variables) must be
defined (implicitly or explicitly)
• Implicit declaration (e.g. Fortran, Perl)
– Fortran has implicit typing
– everything in Perl is a string, array or hash
(identified by first character: $, @ or %)
• Problems:
– lexical errors (typing errors!) will create
new variables
Names and types
• Typeless declaration (e.g. Lisp,
Smalltalk)
– names must be defined before use
– everything in Lisp is a list
– everything in Smalltalk is an object
• Problems:
– type mismatches not detected until runtime
Names and types
• Typed declarations (most languages)
– strong typing or weak typing?
• Strong typing:
– no automatic type conversion (e.g. Ada)
– some escape mechanism is usually
required (e.g. Unchecked_Conversion)
Names and types
• Weak typing
– automatic type conversions provided (e.g.
PL/1)
– can disguise problems due to use of a
wrong variable
• Most languages are a compromise
– some automatic conversions (e.g Algol 68)
Names and types
• Type inferencing (e.g. Haskell)
– types inferred from operations used
– most general available type used (e.g.
Num for numeric types)
– use of specific types (e.g. Int where Num is
required) will force use of Int throughout
Higher-order types
• Functions as types
– e.g. Lisp, Haskell
– functions can create new functions and
return them as their result
• Types as types?
– e.g. Smalltalk has a class called Class
– so does Java (reflection API)
Fortran
• Fortran ("FORmula TRANslation")
– developed 1954 by John Backus, IBM
– designed for scientific calculations
• Implementation
– based on IBM 704 systems
– language features designed ad-hoc to suit
IBM 704
– buggy hand-coded compilers (pre-1958)
Fortran
• No formal language specification
– compiler was the acid test of correctness
• Line-oriented format
– based on 80-column punched cards
– 1-5: statement label, C for comment
– 6: continuation of previous line if not blank
– 7-72: statement
– 73-80: sequence number
Fortran
• Automatic declarations
– I to N: integer
– A to H, O to Z: real (floating point)
– static (so no recursion)
• Control constructs:
– GOTO n, GOTO (a,b,c) n
– IF (n) a,b,c
– DO n I=a,b
Cobol
• COmmon Business Oriented Language
– descendant of Flow-matic (1955), Univac I
– English-like vocabulary and syntax
• Four "divisions" in a program:
– identification division
– environment division
– data division
– procedure division
Cobol
• Business oriented features:
– file processing
– report formatting
• Data description via "picture" clauses
– PIC 9999: 4-digit decimal number
– PIC X(20): 20-character string
– PIC 999V99: 5-digit number with 2 decimal
places
Algol 60
• The most influential language ever
– nearly all imperative languages are
descendants of Algol 60
• Descended from IAL (International
Algebraic Language), aka Algol 58
– initial Algol 60 report: 1959/60
– revised report: 1963
Algol 60
• Innovations:
– formal syntax specification
– reference language, publication language,
hardware representation
– block structure, local variables, recursion
– structured control statements
– free format
Algol 60
• Problems:
– ambitious design ignored compilation
issues
– compilers did not appear until mid-1960s
– only caught on in Europe (USA continued
to use Fortran)
– no standard input/output
– no string manipulation
Algol 60
• Main descendants:
– JOVIAL (Jules' Own Version of the IAL)
– Simula 67 (ancestor of OOP concepts in
C++ and Java)
– Algol W (Wirth), hence Pascal, Modula-2,
Oberon, Ada
– CPL, hence BCPL, B, C, C++, Java
– Algol 68
Algol 68
• Successor to Algol 60
– supported by IFIP
– "Report on the Algorithmic Language Algol
68", van Wijngaarden et al., 1969
– revised report, 1975
• Committee split by disagreements
– Wirth et al. produced minority report
– went on to develop Pascal instead
Algol 68
• Main features
– highly orthogonal language
– expression-based
– references used as a way of uniting
names, values, pointers etc.
– concurrency support
• Descendants:
– RTL/2, Bourne shell, SPL
Algol 68
• Syntax described using two-level "van
Wijngaarden" grammar
– context-free upper level grammar
generated infinite set of "hypernotions"
– these generated an infinite number of rules
by substituting hypernotions into lower
level rules
– described semantic properties (name
bindings, coercions, etc.)
Algol 68
• Problems
– grammar seen as too complex (and
unparsable)
– language seen as too complex
• Implementations were slow to appear
– only subsets were usually implemented
(Algol 68R, Algol 68C, etc.)
PL/1
• IBM's attempt at a general-purpose
language
– part of System/360 project (1964)
– amalgam of Fortran, Cobol, Algol
– originally called NPL (New Programming
Language), then MPPL (Multi-Purpose
Programming Language)
PL/1
• Problems
– large, complex, difficult to implement
– IBM's personal property (they also
copyrighted PL/2 to PL/100)
– descendants include PL/360, PL/M
Basic
• "Beginner's All-purpose Symbolic
Instruction Code"
– Kemeny & Kurtz, early 1960s
• Developed as teaching language at
Dartmouth College (USA)
– simplified Fortran-like syntax
– interactive!
APL
• "A Programming Language" (K. Iverson)
– array processing using powerful array
operators
– designed for expressing numerical
algorithms
– proverbially cryptic, requires extensive
character set
– ancestor of J
String processing
• Application areas:
– natural language processing
– symbolic mathematics
– general text processing
• COMIT
– MIT, late 1950s
String processing
• SNOBOL
– descendant of COMIT from Bell Labs
– Snobol 4 (Ralph Griswold, 1967)
– ancestor of Icon
• Uses success/failure of string matching
to control program flow
– generalised into Icon's "generators"
String processing
• Scripting languages
– Perl, Python, Tcl, Ruby, ...
– also shell command languages
– used to "script" actions of programs in
other languages
• Macro languages
– GPM, Trac, M2, ML/1, PHP, ...
– embedded text processing
List processing
• Application areas
– artificial intelligence, natural language
processing
• IPL-V
– Carnegie Institute, late 1950s
• LISP
– developed by John McCarthy (Stanford),
co-author of Algol 60 revised report
List processing
• LISP development:
– LISP 1, LISP 1.5 (late 1950s)
– Common Lisp (ANSI, early 1980s)
• Innovations:
– programs are lists too
– garbage collection
– functional programming style (recursion
instead of iteration)
Systems programming
• For implementing "system software"
– also real-time or embedded applications
– operating systems, compilers, etc.
• Requirements:
– efficiency is crucial
– low-level access top hardware facilities
– bit-twiddling operations
Systems programming
• BCPL
– "Basic CPL"
– untyped language (or monotyped)
• C
– descended from BCPL via B
– used to implement Unix
Systems programming
• Bliss
– used by DEC for writing operating systems
and utilities
• Coral 66
– MOD standard based on Algol 60
– direct access to memory
• RTL/2
– real-time language supporting concurrency
Systems programming
• JOVIAL
– descendant of IAL (Algol 58)
– used extensively by US DoD
• Forth
– embedded language developed for radio
telescope control
– ancestor of PostScript (printer page
description language)
Object-oriented languages
• Simula 67
– simulation language descended from Algol
60 (Dahl & Nygaard)
– introduced the concept of a "class"
• Smalltalk
– developed by Alan Kay at Xerox PARC,
1972 (revised to produce Smalltalk-80)
– interpreted language, message-passing
Object-oriented languages
• Objective-C
– mixture of C and Smalltalk by Brad Cox
– primary language for NeXT computers
• C++
– mixture of C and Simula by Bjarne
Stroustrup
– Bell Labs: telephone routing simulation etc.
Object-oriented languages
• Java
– originally called Oak (James Gosling, early
1990s)
– redeveloped at Sun for Internet use, 1995
• Eiffel
– developed by Bertrand Meyer, early 1980s
– assertion-based with compiler-checked
preconditions and postconditions
Other languages
• Functional languages
– Lisp, Scheme
– FP, ML, Miranda, Haskell
• Rationale
– break away from von Neumann tradition
(mutable variables, iteration etc.)
– mathematical purity will mean program
correctness can be proved formally
Other languages
• Constraint-based languages
– Prolog (Colmerauer and Roussel, 1971)
and many derivative languages
– unification-based, goal-oriented theoremproving approach
• Applications:
– natural language, artificial intelligence
– basis for Japan's 5th generation project
Other languages
• Other AI languages:
– POP 2 (Robin Popplestone, Edinburgh,
late 1960s)
– Pop-11 (mid 1970s)
– POPLOG (Sussex, late 1980s) combined
Pop-11, ML, Common Lisp
Other languages
• Visual languages
– programs constructed using 2D graphics
instead of a 1D text stream
• Examples
– spreadsheets?
– ProGraph
– Lego Mindstorms
– Alice
The von Neumann legacy
• Computers all based on von Neumann's
design principles
– single memory consisting of identical
numbered cells
– memory used for instructions and data
– values in memory are mutable
– processor executes one instruction at a time
– instruction sequence given explicitly
The von Neumann legacy
• Most languages mirror the von
Neumann design
– variables: mutable objects in memory
– execution sequence given by sequence of
instructions, (and loops, ifs, etc.)
– single thread of control
– pointers: memory cells can reference
("point to") other memory cells
– no type information in memory cells
The von Neumann legacy
• Problems:
– dangling pointers
– uninitialised variables
– variables going out of range
– can't reason about values of mutating
variables
– non-optimal sequencing
The von Neumann legacy
• Dangling pointers:
– pointer to ‘thin air’?
– pointer to a valid (but wrong) address?
– pointer to right address, but wrong type of
data
• Related problems:
– memory leaks
– garbage collection
The von Neumann legacy
• Uninitialised variables:
– undetectable, since any value in a memory
cell might be valid
– value is undefined (note: technical term!)
– uninitialised pointers
• Could automatically initialise variables
– e.g. Java (partly), Basic
– loss of efficiency?
The von Neumann legacy
• Out-of-range variables
– arithmetic overflow (wrong sign for result)
– no hardware support for range checking
• Software range checking (like Ada)
– added overhead, so reduced efficiency
– but: manual checks can't be optimised by
compiler
The von Neumann legacy
• Mutating variables
– can't apply mathematical proof techniques
– values can change arbitrarily
• Global variables
– can be affected non-locally
– hard to debug
– rogue pointers + single memory means all
memory is effectively global!
The von Neumann legacy
• Explicit sequencing
– might be a better sequence?
– optimal sequence might depend on
external events
• Multithreading
– synchronization issues (mutable data
again)
The von Neumann legacy
• Data-flow sequencing
– instructions can be executable at any time
as long as all dependencies are satisfied
• Example: making tea
– tasks: boil water, put tea in pot, put water in
pot, pour milk, pour tea
– tasks 1, 2, 4 possibly concurrent
– task 3 depends on 1, 5 depends on 3
Functional programming
• Influenced by Church's lambda calculus
– mathematical theory of functions
– if x and y are lambda expressions, so is
(x.y)
– if x is a variable and y is a lambda
expression, λxy is a lambda expression
• Can describe rules of arithmetic using
nothing but lambda expressions
Lambda calculus
• Evaluating:
– (λxy . z) is reduced to y, with all occurrences
of x in y replaced by copies of z
– like function application (x is formal
parameter, y is function body, z is actual
parameter)
Lisp
• Original functional language (John
McCarthy, ~1960)
• A list is a pair (car . cdr)
– car = head of list, cdr = tail of list
– longer lists possible, e.g. (a . (b . (c . nil)))
– abbreviation: write this as (a b c)
• Programs are represented by lists, too
Lisp
• Items are atoms or lists
– list components are either atoms or other
lists
– the empty list: (), or "nil"
• Basic functions:
– (car (a . b)) = a, (cdr (a . b)) = b
– (cons a b) = (a . b)
– (atom x): true if x is an atom
Lisp
• Booleans:
– true represented by atom T, false by () or NIL
• Arithmetic:
– prefix notation used, e.g. (+ 1 2) = 3
• Control:
– COND function for choosing alternatives
– recursion instead of loops
Lisp
• Example: program to sum the positive
numbers in a list
(def sum (lambda (x)
(cond
((null x) 0)
(t (+ (car x) (sum (cdr x))))
)
))
– LISP = "Lots of Irritating Superfluous
Parentheses"
Lisp
• Important ideas
– use of lists to represent data structures
– use of lists to represent programs, too
– use of recursive functions
– garbage collection
– functions as first-class objects:
(def name lambda-expression)
Lisp
• Problems with Lisp:
– inefficient interpreted implementations
– too different!
– untyped, so no error-checking at compile
time
• Non-functional features added later
– prog, setq, goto
FP
• John Backus' Turing Award paper
– ‘Can programming be liberated from the
von Neumann style?’ (Communications of
the ACM, 1978)
– explicit discussion using a language called
FP
Functional languages
• Later developments:
– efficient implementations: combinators, tail
recursion, better garbage collection
– pattern matching notation:
fun sum () = 0
| sum (x : xs) = x + sum xs
– lazy evaluation (operations on infinite lists)
Functional languages
• Other concepts:
– ‘curried’ functions (named after Haskell
Curry): partial application
– higher-order functions: functions that
operate on functions
– strong typing, type inferencing
– monads: sequencing in a pure functional
style, I/O using streams
Haskell
• Named after Haskell Curry (again)
• An example of a modern functional
language
– lazy evaluation
– strong typing
– type inferencing
– polymorphism
Haskell
• Data types
– basic types (Int, Char etc)
• 123
'x'
– lists: [t] is a list of type t
• [1, 2, 3] [1..5] [2..] [ x*x | x <- [1..5]]
• [1,2] same as (1:(2:[]))
– tuples: (t1,t2,t3) is a tuple of different types
• (1, "this", 5.0)
• (Int, [Char], Float)
(1,2)
(Int,Int)
Prolog
• Uses a theorem-proving approach to
programming
• Programs consist of:
– a list of starting points (facts)
– a list of rules giving relationships between
facts
– a goal
Prolog
• No explicit sequencing is needed
– the interpreter attempts to construct a path
to the goal using the rules and facts
provided
– if it reaches a dead end, it will backtrack
and try a different sequence
• No variables are needed
– values are bound using unification
Prolog
• Example facts:
male(albert).
male(edward).
female(alice).
female(victoria).
parents(edward,victoria,albert).
parents(alice,victoria,albert).
Prolog
• Example rules:
sibling(X,Y) :- parent(X,A,B), parent(Y,A,B).
sister(X,Y) :- sibling(X,Y), female(Y).
• Example goal:
?- sister(edward,S).
– evaluation proceeds by trying to match this
goal using the available rules and facts
– variables are unified with matching values
Prolog
• Example evaluation:
– sister(edward,S) matches sister(X,Y)
where X=edward and Y=S, so...
– new goals: sibling(edward,S) and
female(S)
– sibling(edward,S) creates new goals:
parent(edward,X,Y) and parent(S,X,Y)
Prolog
• Example evaluation:
– parent(edward,X,Y) matches
parent(edward,victoria,albert), so
X=victoria and Y=albert
– parent(S,victoria,albert) matches
S=edward
– female(edward) fails, so backtrack: undo
the last match (S=edward) and try S=alice
Prolog
• Pattern matching used for lists:
append([],X,X).
append([A|B],C,[A|D]) :- append(B,C,D).
• Example: append([1,2],[3,4],X)
– X = [1,D] where append([2],[3,4],D)
– X = [1,2,D'] where append([],[3,4],D')
– X = [1,2,3,4]
• Also: append(X,Y,[1,2,3,4]).
Prolog
• Data structures:
– built-in support for integers and integer
arithmetic
– lists (similar to Haskell)
– facts as tree structures, e.g.
parent(edward,victoria,albert)
Prolog
• Evaluation sequence:
– try to match rules sequentially
– bind variables to values as part of this
process
– if a rule fails, backtrack (undo the last
successful match) and try the next rule
Prolog
• Once a variable has been unified, it
keeps the same value throughout the
current application of the current rule
– the same name may be bound to different
values in different rules (names are local)
– the same name may have a different value
in a different instance of the same rule
• Exception: the anonymous variable ‘_’
Prolog
• Prevent backtracking using a ‘cut’
– symbolised by ‘!’
– if X is matched successfully in X, !, Y
failures in Y will not backtrack past the cut
to search for another value for X
parents(adam,0) :- !.
parents(eve,0) :- !.
parents(X,2).
Prolog
• More cut examples:
factorial(N,F) :- N < 0, !, fail.
factorial(0,1) :- !.
factorial(N,F) :N2 is N-1, factorial(N2,F2), F is N*F2.
– ‘cut and fail’ in first rule prevents negative
arguments
– cut in second rule prevents attempts to use
rule 3 to evaluate factorial 0
Prolog
• Input & output:
– special predicates (e.g. write/1) which
always succeed
– I/O operation performed as a hidden side
effect
• Storing state information
– use asserta/1 to add a rule
– use retract/1 to retract a rule
Prolog
• State example:
can_go(Place):- here(X), connect(X, Place).
can_go(Place):write('You can''t get there from here.'), nl,
fail.
move(Place):retract(here(X)), asserta(here(Place)).
here(kitchen).
connect(kitchen,hall).
move(hall).
Complexity
• Languages have evolved to be able to
cope with problems of increasing
complexity
– systems of upwards of 100KLOC are
common in real life
– early languages had problems with
systems of this size
Complexity
• The coupling problem
– the more one piece of software knows
about another, the higher the coupling
– can't make changes without affecting other
parts of the system that are coupled to it
– primary goal is to minimise or abolish
coupling between components
– analogy: standard connectors in hardware
Complexity
• Spaghetti programming
– unstructured, use of goto statements
– makes it hard to partition software
• Can’t make assumptions about the
properties of any block of code
– can jump into the middle of any block from
somewhere else
Complexity
• Structured programming
– program composed of structured blocks
– one entry point, one exit point
– can reason about code within a block
• Structured blocks decouple flow-ofcontrol issues
– but not flow-of-data issues
Complexity
• Global variables
– can be accessed from anywhere, so hard
to reason about values of variables
• Local variables
– only accessible within a single block, so
can reason about them
– static variables retain their values from one
execution of a block to the next
Complexity
• Subroutines (procedures, functions)
– allows program to be decomposed into
functional modules
– can pass parameters as a way of avoiding
using global variables for communication
– reduces data coupling if global variables
avoided
Complexity
• Separate compilation
– allows code reuse through subroutine
libraries
– can compiler check that an external
subroutine is being called correctly?
– requires cooperation from linker...
Complexity
• Fortran
– separately compiled subroutines
– no link-time checking
– static & global variables only (so no
recursion)
– unstructured coding (e.g. no structured if
statement)
Complexity
• Algol 60/68, Pascal
– no separate compilation (non-standard
language extensions needed)
– procedures are nested within main
program, so call checking is possible
– structured programming possible
– local / global variables, no static variables
– recursion possible
Complexity
• C
– local, global, static variables
– no linkage checking
– use of common ‘header files’ to allow
compiler checking (but not foolproof)
Complexity
• Modules
– related groups of procedures, variables
etc.
– can selectively export some of the contents
but can hide others
– can create abstract data types
– examples: Modula-2, Oberon, Ada (and
also O-O languages)
Complexity
• Modules have a public interface
– allows compile-time checking of calls
– implementation is hidden (and can be
changed)
– interface specification forms a contract
between a module and its users
– inter-module coupling is reduced
Complexity
• Object oriented languages
– modules become data types!
– inheritance, polymorphism
• Problems with inheritance errors
– incorrect signature when overloading
operations (only C# fixes this one so far)
Complexity
• Modules/classes should be client-neutral
– should not make any assumptions about
their users
– should allow code reuse
• Exception handling
– decouples error detection from error
recovery
Complexity
• The goal: ‘software components’
– use like hardware components
– buy off the shelf, ready tested, and
assemble into a complete system
• Generic programming
– Ada generics, C++ templates
– allows algorithms to be decoupled from the
data types they operate on
Errors
• Error detection
– at compile time (errors within a module)
– at link time (errors resulting from
mismatched calls)
– at run time
– at instantiation time (e.g. C++ templates)
– at derivation time (errors in inheritance,
inability to override/implement operations)
Errors
• Compile time errors
– well known how to detect errors, but what
errors does the language allow to be
detected?
– strong typing allows type mismatches to be
detected
– preconditions and postconditions (e.g.
Eiffel) allow for better static verification
– variable-length parameter lists (e.g. C)
reduce checkability
Errors
• Link-time errors
– do separately-compiled modules include
parameter information?
– C doesn’t do this
– C++ used ‘name mangling’ to add type
information to function names for use with
standard linkers
Errors
• Run-time errors:
– e.g. array bounds checking
– use of exceptions to report errors and
recover from them
– run-time checks decrease program
efficiency (extra code means more space
and time)
– leaving checks out in production code like
leaving parachute behind after you get
your pilot’s license
Errors
• Instantiation time errors:
– happens when trying to reuse code written much
earlier
– code is generated automatically with generic
parameters plugged in
– if parameters don’t match the way the code uses
them, the compilation fails
– error messages usually relate to generic code, not
what you wrote
– Ada generics have precise contracts to avoid this
problem
Errors
• Derivation time errors:
– happens when deriving new classes in
object-oriented languages
– happens long after original class was
written
– you override an operation but the old one
gets called (possible in C++, Ada)
– you think you’re overriding but you get it
wrong and add a new function (only C#
handles this one)
Errors
• Concurrency
– allows the use of ‘active objects’ executing
independently
– can partition program across multiple
processors and/or machines
– use RPC, CORBA, Ada partitions, ...
– occam and transputers
Errors
• Imperative languages have problems
with concurrency
– problem must be partitioned by hand (the
programmer must identify concurrency)
– lack of referential transparency means
synchronisation needed when updating
shared variables
– sychronisation easy to omit or get wrong
Errors
• Other problems with concurrency:
– errors can be time-dependent
– hard to reproduce reliably
– hard to diagnose
– new failure modes (e.g. deadlock)
Summary
• Languages evolve at the whim of their
designers
– usually in the direction of increased
expressiveness and safety, but not always
– some features turn out to be bad, but only
in hindsight
– there is a dynamic between easy of use
and safety (e.g. strong vs. weak typing)
– programmers can be resistant to new ideas

CS337 Comparative Programming Languages

Transcript CS337 Comparative Programming Languages

Directory