Introduction. Syntax.
Download
Report
Transcript Introduction. Syntax.
Introduction and Syntax
Course objectives
• Discuss features of programming
languages.
• Discuss how the features are implemented
in a simple computer architecture (what is
happening under the hood).
• Introduce a few specific programming
languages: C++, Java, Ada, Scheme, ML.
Desiderata
Ease of human use: Writing, debugging,
reading, maintaining.
Automated/formal analysis
Efficient implementation
Portability: Machine independence
Categories of languages
• Imperative (Assembly, FORTRAN, Algol,
C, Ada …). Statement modifying memory
locations execute in sequence.
• Object-oriented (Smalltalk, C++, Java):
Computation via interaction of objects.
• Functional: (LISP, Scheme, ML).
Computation as definition and application
of functions.
Categories of languages (cntd)
• Logic-based (Prolog): Computation as
inference from statements and rules.
• Special purpose (Postscript, Javascript,
database languages): PL’s geared toward
a specific application.
Components of a PL
• Syntax: What constitutes a well-formed
program?
• Semantics: What computational activity
constitutes a proper execution of a given
program on a given input?
• Implementation.
Compilers and interpreters
An interpreter preserves the text of the
program, and executes the program by
constant referral to the text.
A compiler translates the program from the
source language into a target language.
This may be either machine language or
(more commonly) a lower level language
(e.g. assembly or C).
Compilers and interpreters (cntd)
The distinction between compiled and interpreted
languages is not clear cut:
• Most interpreters begin by doing some degree of
preprocessing (e.g. comment removal).
• The target language may then be interpreted
(e.g. Java compiles into byte language,
interpreted on JVM).
• Machine language is interpreted in microcode.
Advantages of interpreters
• Some language features are difficult or
impossible to compile (e.g. writing and
executing code on the fly).
• Portability and machine independence
(This is the main reason that Java byte code
and Javascript are interpreted.)
• Interpreters are easier to write than
compilers.
Advantages of interpreters (cntd)
Interpreters also make it easier to get
• Interactive environments
• Source code is available for debugging.
• Light-weight coding (e.g. no declarations)
though it is not impossible to get these
feature in compiled languages.
Advantage of compilers
• Speed of compiled code. One or two
orders of magnitude.
• Can distribute object code without
publishing source code.
Time of features
A compile-time feature of a program can be
determined before execution begins (and
is therefore independent of the input data).
A run-time feature of a program can only be
detected during execution and is generally
dependent on the input.
Syntax
The syntax of a program is its formal
structure.
The line separating syntactic from nonsyntactic features is somewhat arbitrary.
All syntactic features are detectable at
compile time, but not vice versa.
E.g. it is detectable at run time that “x=1/0” is
an error, but this is not (generally) a
syntactic error.
Tokens
In almost all programming languages, a
program is a sequence of tokens.
Types of tokens:
• Special symbols: “;”, “+”, “++”, “{“ …
• Reserved words: “if”, “then”, “function” …
• Numbers: “5”, “4123”, “1.2E+08”, “0x4AC”
• Identifiers: “x”, “i”, “append”, “employee” …
Tokens (cntd)
Language specific rules for:
• What forms are in each of these
categories.
• How tokens are delimited (usually by
white space or special symbols, but not
in FORTRAN)
• When two tokens are the same (e.g.
case sensitivity)
Tokens (cntd)
Lexer or tokenizer divides the source code
into tokens and categorizes the tokens.
Generally, this is done by a regular language
= finite automaton.
Syntax tree
int increment (int i)
{ return (1+i); }
Backus-Naur Form (BNF)
FunDefn ::= FunDecl Block
Term ::= num | var | Term arithOp Term |
funCall
ArgList ::= () | (VarDecl [, VarDecl]*)
Context-Free Language
A BNF definition defines a context-free
language.
The syntax of programming languages is
“almost” context free. A few syntactic
constraints are not (e.g. identifiers must be
declared before they are used).