15-04-24_Bjarnex - Columbia University

Download Report

Transcript 15-04-24_Bjarnex - Columbia University

Alfred V. Aho
[email protected]
The Evolution of AWK:
Computational Thinking in
Programming Language Design
COMS W4995 Design - Bjarne Stroustrup
April 24, 2015
1
Al Aho
Computational Thinking
Computational thinking is a fundamental
skill for everyone, not just for computer
scientists. To reading, writing, and
arithmetic, we should add computational
thinking to every child’s analytical ability.
Just as the printing press facilitated the
spread of the three Rs, what is
appropriately incestuous about this vision
is that computing and computers
facilitate the spread of computational
thinking.
Jeannette M. Wing
Computational Thinking
CACM, vol. 49, no. 3, pp. 33-35, 2006
2
Al Aho
What is Computational Thinking?
The thought processes
involved in formulating
problems so their solutions
can be represented as
computation steps and
algorithms.
Alfred V. Aho
Computation and Computational Thinking
The Computer Journal, vol. 55, no. 7, pp. 832- 835, 2012
3
Al Aho
Software in Our World Today
How much software does the world use today?
Guesstimate: over one trillion lines of source code
What is the sunk cost of the legacy base?
$100 per line of finished, tested source code
How many bugs are there in the legacy base?
10 to 10,000 defects per million lines of source code
A. V. Aho
Software and the Future of Programming Languages
Science, February 27, 2004, pp. 1131-1133
4
Al Aho
Programming Languages Today
Today there are thousands of programming languages.
The website http://www.99-bottles-of-beer.net
has programs in over 1,500 different
programming languages and variations to
generate the lyrics to the song “99 Bottles
of Beer.”
5
Al Aho
“99 Bottles of Beer”
99 bottles of beer on the wall, 99 bottles of beer.
Take one down and pass it around, 98 bottles of beer on the wall.
98 bottles of beer on the wall, 98 bottles of beer.
Take one down and pass it around, 97 bottles of beer on the wall.
.
.
.
2 bottles of beer on the wall, 2 bottles of beer.
Take one down and pass it around, 1 bottle of beer on the wall.
1 bottle of beer on the wall, 1 bottle of beer.
Take one down and pass it around, no more bottles of beer on the wall.
No more bottles of beer on the wall, no more bottles of beer.
Go to the store and buy some more, 99 bottles of beer on the wall.
[Traditional]
6
Al Aho
“99 Bottles of Beer” in C++
#include <iostream>
using namespace std;
int main()
{
int bottles = 99;
while ( bottles > 0 )
{
cout << bottles << " bottle(s) of beer on the wall," << endl;
cout << bottles << " bottle(s) of beer." << endl;
cout << "Take one down, pass it around," << endl;
cout << --bottles << " bottle(s) of beer on the wall." << endl;
}
return 0;
}
[Tim Robinson, http://www.99-bottles-of-beer.net/language-c++-109.html]
7
Al Aho
Stroustrup’s Version
#include <iostream>
using namespace std;
int main()
{
for (int bottles = 99; bottles>0; --bottles)
cout << bottles << " bottle(s) of beer on the wall,\n"
<< bottles << " bottle(s) of beer.\n"
<< "Take one down, pass it around,\n"
<< " bottle(s) of beer on the wall.\n\n";
}
[Bjarne Stroustrup, personal communication, April 17, 2015]
8
Al Aho
“99 Bottles of Beer” in the Whitespace language
[Andrew Kemp, http://compsoc.dur.ac.uk/whitespace/]
9
Al Aho
Why Are There So Many Languages?
• One language cannot serve all application areas well
– e.g., programming web pages (JavaScript)
– e.g., electronic design automation (VHDL)
– e.g., parser generation (YACC)
• Programmers often have strongly held opinions about
– what makes a good language
– how programming should be done
• There is no universally accepted metric for a good
language!
10
Al Aho
Evolution of Programming Languages
11
Al Aho
1970
2015
2015
Fortran
Java
Java
Lisp
C
PHP
Cobol
C++
Python
Algol 60
Objective-C
C#
APL
C#
C++
Snobol 4
JavaScript
C
Simula 67
PHP
JavaScript
Basic
Python
Objective-C
PL/1
Visual Basic
MATLAB
Pascal
Visual Basic .NET
R
TIOBE Index
April 2015
PYPL Index
April 2015
Evolutionary Forces on Languages
Increasing diversity of applications
Stress on increasing programmer productivity
and shortening time to market
Need to improve software security, reliability
and maintainability
Emphasis on mobility and distribution
Support for parallelism and concurrency
New mechanisms for modularity
Trend toward multi-paradigm programming
12
Al Aho
Models of Computation in Languages
Early programming languages usually had only one model of
computation:
Fortran (1957): Procedural
Lisp (1958): Functional
Simula (1967): Object oriented
Prolog (1972): Logic
SQL (1974): Relational algebra
13
Al Aho
Models of Computation in Languages
New programming languages are often designed around
several models of computation
And legacy languages are incorporating additional models
of computation to support multiple programming
paradigms
14
Al Aho
Example: Elm
• Elm is a functional programming language for declaratively
creating web browser based graphical user interfaces.
• It uses functional reactive programming and purely
functional graphical layout to build user interfaces
without any destructive updates.
• Elm was designed in 2012 by Evan Czaplicki.
• The key features in Elm are signals, immutability, static
types, and interoperability with HTML, CSS, and
JavaScript.
elm-lang.org
15
Al Aho
Example: Rust
• Rust is a general-purpose, multi-paradigm, compiled
programming language.
• It is designed to be a safe, concurrent, practical language.
• First pre-alpha release of the Rust compiler was in 2012.
• It supports pure-functional, concurrent-actor,
imperative-procedural, and object-oriented programming
styles.
• Rust was originally designed by Graydon Hoare and is
supported by Mozilla Research.
• It advertises itself as “a systems programming language
that runs blazingly fast, prevents almost all crashes, and
eliminates data races.”
www.rust-lang.org
16
Al Aho
Example: Swift
• Swift is Apple’s new programming language for iOS and
OS X whose code is designed to work with Objective-C
• It was designed with code safety and performance in
mind.
• Some of the features of Swift include
– named parameters
– inferred types
– modules
– automatic memory management
– closures with unified function pointers
– functional programming patterns like map and filter
https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/
17
Al Aho
The Birth of AWK
• AWK is a scripting language for routine dataprocessing tasks designed by Al Aho, Brian
Kernighan, Peter Weinberger at Bell Labs around
1977
• Each of the co-designers had slightly different
motivations
– Aho wanted a generalized grep
– Kernighan wanted a programmable editor
– Weinberger wanted a database query tool
• All co-designers wanted a simple, easy-to-use
language
18
Al Aho
Kleene Regular Expressions
Regular Expression
c
r1 | r2
Matches
the character c itself except when it is (, ), or *
r1 or r2
r1 r2
r1 followed by r2
r*
zero or more instances of r
(r)
r
• ‘|’ has lowest precedence, then concatenation, then *
• ‘|’ and concatentation are left associative, * is right associative
• For example, a | b*c = (a) | ((b*) c)
19
Al Aho
Kleene Regular Expressions and
Finite Automata are Equivalent
• The set of strings denoted by any Kleene regular
expression can be recognized by a deterministic finite
automaton.
• The set of strings recognized by any finite automaton
can be denoted by a regular expression.
20
Al Aho
Grep Regular Expressions
Regular Expression
c
r1 followed by r2
r*
zero or more instances of r
.
any character
^
beginning of line when ^ is first character in regexp
$
end of line when $ is last character in regexp
[abc]
an a, b, or c
[a-z]
any lower-case letter
[^abc]
any character except an a, b, or c
[^0-9]
any character that is not a digit
\( r \)
Al Aho
the character c itself except when c is . [ ] ^ $ * \
r1r2
\c
21
Matches
c unless c is ( ) or a digit
tagged regular expression that matches r
the matched strings are available as \1, \2, etc.
Back Referencing in Grep Regular
Expressions
Grep regular expressions can match non-regular languages:
^\([ab]*\)\1$ matches strings of the form xx where x is any string of a’s and b’s
Back referencing makes the string pattern matching problem NP-complete:
Theorem: Let r be a grep regular expression with back referencing and s
an input string. Determining whether s contains a substring matched by r is
NP-complete.
Proof. Reduction from vertex-cover. See A. V. Aho, “Algorithms for finding
patterns in strings”, Handbook of Theoretical Computer Science, MIT Press
1990, pp. 255-300.
22
Al Aho
Egrep Regular Expressions
• Started with grep regular expressions except for back
referencing
• Added ‘|’ for union as in Kleene regular expressions
• Added parentheses for grouping as in Kleene regular
expressions
• Current egrep uses POSIX regular expressions.
23
Al Aho
Egrep Regular Expression PatternMatching Algorithm
• Constructs the transitions for a deterministic finite
automaton on demand from the regular expression
using an LR(0)-like algorithm
• Uses a fixed size cache to store the transitions of the
DFA
• Adds a transition in a given state on a given input
character only when it is needed
• When the cache becomes full, it flushes it and adds
transitions to the empty cache as needed
• Observed time complexity given a regular expression r
and an input string s is O(|r| + |s|). It is an open
question if this can be achieved in the worst case.
24
Al Aho
The Evolution of AWK
• Prototypical use cases
– selection: “print all lines containing the word AWK in
the first field”
$1 ~ /AWK/
– transformation: “print the second and first field of
every line”
{ print $2, $1 }
– report generation: “sum the values in the first field
of every line and then print the sum and average”
{ sum += $1 }
END { print "sum = " sum, "avg = " sum/NR }
25
Al Aho
Structure and Invocation of an AWK
Program
• An AWK program is a sequence of pattern-action
statements
pattern { action }
pattern { action }
. . .
• Each pattern is a boolean combination of regular,
numeric, and string expressions
• An action is a C-like program
– If there is no { action }, the default is to print the
line
• Invocation
awk ‘program’ [file1 file2 . . . ]
awk –f progfile [file1 file2 . . . ]
26
Al Aho
AWK’s Model of Computation:
Pattern-Action Programming
for each file
for each line of the current file
for each pattern in the AWK program
if the pattern matches the input line then
execute the associated action
27
Al Aho
AWK in a Nutshell - I
• Input is read automatically across multiple files
– lines are split into fields $1, $2, . . ., $NF
– whole line is $0
• Variables are dynamic and can contain string or numeric
values or both
– no declarations: types determined by context and use
– initialized to 0 and empty string
– built-in variables for frequently used values
• Operators work on strings or numbers
– coerce type/value according to context
– what does $1 == $2 mean?
• Associative arrays take arbitrary subscripts
• Regular expressions as in egrep
28
Al Aho
AWK in a Nutshell - II
• Control-flow operators similar to C
– if-else, while, for, do
• Built-in functions for arithmetic, string processing,
regular expressions, editing text, . . .
• Supports user-defined functions
•printf for formatted output
•getline for input from files or processors
29
Al Aho
Some Useful AWK “One-liners”
1. Print the total number of input lines
END { print NR }
2. Print the last field of every line
{ print $NF }
3. Print each line preceded by its line number
{ print NR, $0 }
4. Print all non-empty lines
NF > 0
5. What does this AWK program do?
!x[$0]++
30
Al Aho
AWK Summary
AWK is a scripting language designed for routine
data-processing tasks on strings and numbers
E.g.: given a list of name-value pairs, print the total value
associated with each name.
alice 10
eve 20
bob 15
alice 30
An AWK program
is a sequence of
pattern-action statements
{ total[$1] += $2 }
END { for (x in total) print x, total[x] }
eve 20
bob 15
alice 40
31
Al Aho
Comparison: Regular Expression Pattern Matching
in Perl, Python, Ruby vs. AWK
Time to check whether a?nan matches an
regular expression and text size n
Russ Cox, Regular expression matching can be simple and fast (but is slow in Java, Perl,
PHP, Python, Ruby, ...) [http://swtch.com/~rsc/regexp/regexp1.html, 2007]
32
Al Aho
“99 Bottles of Beer” in AWK (bottled
version)
BEGIN{
split( \
"no mo"\
"rexxN"\
"o mor"\
"exsxx"\
"Take "\
"one dow"\
"n and pas"\
"s it around"\
", xGo to the "\
"store and buy s"\
"ome more, x bot"\
"tlex of beerx o"\
"n the wall" , s,\
"x"); for( i=99 ;\
i>=0; i--){ s[0]=\
s[2] = i ; print \
s[2 + !(i) ] s[8]\
s[4+ !(i-1)] s[9]\
s[10]", " s[!(i)]\
s[8] s[4+ !(i-1)]\
s[9]".";i?s[0]--:\
s[0] = 99; print \
s[6+!i]s[!(s[0])]\
s[8] s[4 +!(i-2)]\
s[9]s[10] ".\n";}}
33
Al Aho
[Wilhem Weske, http://www.99-bottles-of-beer.net/language-awk-1910.html