Lazy Evaluation in Numeric Computing

Download Report

Transcript Lazy Evaluation in Numeric Computing

Lazy Evaluation in Numeric
Computing
20 Октября 2006
Игорь Николаевич Скопин
Agenda
• Elementary introduction to functional programming: reduction,
composition and mapping functions
• Lazy Evaluation:
–
–
–
–
–
Scheme
Principles
Imperative Examples
Analysis
Necessity in Procedure Calls
• Dependencies of Lazy Evaluation on the Programming Style
• Newton-Raphson Square Roots
– Functional Program
• Numerical Differentiation
• FC++ library
– “Lazy list” data structure in FC++
• Memoization
• Lazy Evaluation in Boost/uBLAS: Vectors/Matrices Expressions
– Vectors/matrices expressions and memoization
– style of programming
• The Myths about Lazy Evaluations
• Q&A
Gluing Functions Together: reduce
list of x = nil | cons x (list of x)
[]
 nil
specific to sum [1]
 cons 1 nil
sum nil = 0
sum (cons num list) = num + sum list
[1,2,3]  cons 1 (cons 2 (cons 3
nil))
sum = reduce add 0
add x y = x + y
(reduce f x) l
reduce f x nil = x
reduce sum 0 ( 1 , 2 , 3 , [ ] )
⇒
reduce f x (cons a l) = f a (reduce f x l)
1+2+3+0
reduce multiply 1( 1 , 2 , 3 , [ ] ) ⇒
product = reduce multiply 1
1*2*3*1
(reduce cons nil) α ⇒ α
//copy of α
reduce (:) [ ]
// — Haskell
reduce — is a function of 3 arguments, but it applies to 2 only ⇒ the result is a function!
append a b = reduce cons b a
append [1,2] [3,4] = reduce cons [3,4] [1,2]
= (reduce cons [3,4]) (cons 1 (cons 2 nil))
f
x
a
l
= cons 1 (reduce cons (cons 3 (cons 4 (cons nil)))) (cons 2 nil))
f
x
a l
= cons 1 (cons 2 (reduce cons (cons 3 (cons 4 (cons nil))) nil))
= cons 1 (cons 2 ((cons 3 (cons 4 (cons nil))))
Gluing Functions Together: Composition and Map
A function to double all the elements of a list
reduce f nil gives expansion f to list
doubleall = reduce doubleandcons nil
where
doubleandcons num list = cons (2*num) list
specific to double
Further:
doubleandcons = fandcons double
where
An arbitrary function
double n = 2*n
fandcons f el list = cons (f el) list
Function composition — standard operator “.”:
(f . g) h = f (g h)
This definition is correct:
So
fandcons f el
= (cons . f) el
fandcons f = cons . f
= cons (f el)
Next version of doubleall:
fandcons f el list = cons (f el) list
doubleall = reduce (cons . double) nil
Function map (for all the elements of list):
specific to double
map f = reduce (cons. f) nil
One more example:
Final version of doubleall :
summatrix = sum . map sum
doubleall = map double
Lazy Evaluation: Scheme
F and G — programs:
Using a temporary file
( G.F ) input  G ( F input )
It is possible: F input → tF; G tF, but it is not good!
The attractive approach is to make requests for computation:
More precisely:
G:
Hold up
Hold up
G:
Needed data
produced by F
Hold up
F:
Data are ready
Hold up
Needed data
F:
…
Resume F
Data are ready
Lazy Evaluation: Principles
Postulates:
•
•
•
Any computation is activated if and only if it is necessary for
one or more other computations. This situation is named as
necessity of computation.
An active computation is stopped if and only if its necessity
vanishes (for example it has been satisfied).
The computation, as a whole, is activated forcibly as a request
(necessity) to obtain the results of computational system
execution.
Consequences:
•
•
The activation of all computational units is driven by a
dataflow started when a necessity of computation as a whole
arises.
A control flow of the computation is not considered as a priori
defined process. It is formed up dynamically by the
necessities of computations only.
Lazy Evaluation: Imperative Examples
Boolean expression
ℜ ≡ α&β&γ : (α == false) ⇒ ℜ == false
(α == true) & (β == false) ⇒ ℜ == false
if ( (precond) &&
(init) &&
(run) &&
(close) )
{ printf (“OK!”); }
else …
Arithmetic expression
x = ( … ) * 0; ⇒ x = 0;
without computation of (…)!
Necessity of
computation
may not
appear!
Lazy and eager evaluation of files handling
cat File_F | grep WWW | head -1
When we write
Subject of next slide
Vector/matrix expressions
The compiler does the following:
vector a(n), b(n), c(n);
a = b + c + d;
The same for matrices
Not everything is so good!
We’ll discuss this later
Vector* _t1 = new Vector(n);
for(int i=0; i < n; i++) _t1(i) = b(i) + c(i);
Vector* _t2 = new Vector(n);
for(int i=0; i < n; i++) _t2(i) = _t1(i) + d(i);
for(int i=0; i < n; i++) a(i) = _t2;
delete _t2;
delete _t1;
So we have created and deleted two temporaries!
for(int i=0; i < n; i++)
a(i) = b(i) + c(i) + d(i);
Analysis of cat File_F | grep WWW | head -1 Lazy and Eager Evaluation
Eager variant of execution:
stdin
stdout
grep WWW
stdin
stdout
stdin
stdout
grep WWW
We omit that the
Suppose
needed string
intermediate
was not
steps
here
detected
at first
stdout
head -1
One string
For
F strings of File_F
A nice question:
is this version
more correct?
Note: It is the same object in both cases!
One string with ‘WWW’
cat File_F
One string
All strings of File_F
Lazy variant of execution:
stdout
head -1
Strings with ‘WWW’
cat File_F
stdin
stdout
UNIX pipeline may be considered as optimization of lazy evaluation (in this case)!
Analysis of Vector/matrix Example
Consider the example more closely:
a = b + c + d;
t1 = b + c;
t2 = t1 + d;
a = t2;
Necessity of
Order of calculations
Traditional scheme
Lazy evaluation
a
b
c
d
a
b
c
d
computation (↓)
appears only
when a [i] is
I
needed
+
t1
II
+
=
=
t2
III
I ⇒ II ⇒ III (or I ⇒ II U III)
+
b[i]
a[i]
+
a[i]
d[i]
c[i]
+
b[i] c[i]
d[i]
Necessity in Procedure Calls
procedure P (in a, out b);
…
P (6*8, x); … r = x*5; …
P ( in a, out b);
6*8
{ … a…
X
… b = …; … }
When does the necessity of computations appear?
in parameters
Q ( in a );
• Real and forced necessity
{
… a = 7; …
• There are many details
… r = 9; …
• Ingerman’s thunks (algol 60)
… t = 2 + a; …
out parameters
…}
• Only forced necessity is possible in imperative languages
…
What hampers the real necessity?
r = 1;
• Dependences on context
Q ( r + 5);
• Possibility of reassignment for variables
SISAL and others languages with a single assignment — this
palliative seems to be not good
Dependencies of Lazy Evaluation on the
Programming Style
Functional programming
• Exactly needed necessities
• Automatic dataflow driven
necessities
Operational programming
• Forcibly arising necessities
• Manual control flow and
agreement driven
necessities
• Control flow and data
• Combining functions and
composition oriented
transforming oriented
approach
approach
• Declarative programming
• Imperative programming
Nevertheless, there are useful possibilities to apply lazy
evaluation in both cases!
The key notion is the definition of necessities.
Functional style may be characterized as a style that allows
automatic dataflow driven necessities.
Newton-Raphson Square Roots
Functional programs are inefficient. Is it true?
Algorithm:
– starting from an initial approximation a0
– computing better approximation by the rule
a(n+1) = (a(n) + N/a(n)) / 2
If the approximations converge to some limit a, then a = (a + N/a) / 2
so 2a = a + N/a, a = N/a, a*a = N ⇒ a = squareroot(N)
Imperative program (monolithic):
X = A0
Y = A0 + 2.*EPS
100 IF (ABS(X-Y).LE.EPS) GOTO 200
Y=X
X = (X + N/X) / 2.
GOTO 100
200 CONTINUE
This program is indivisible in
conventional languages.
We want to show that it is
possible to obtain:
• simple functional program
• technique of its improving
•The result is a very
expressive program!
Newton-Raphson Square Roots: Functional Program
• First version:
next N x = (x + N/x) / 2
[a0, f a0, f(f a0), f(f(f a0)), ..]
repeat f a = cons a (repeat f (f a))
repeat (next N) a0
within eps (cons a (cons b rest))
= b,
if abs(a-b) <= eps
= within eps (cons b rest),
otherwise
sqrt a0 eps N = within eps (repeat (next N) a0)
• Improvement:
relative eps (cons a (cons b rest))
= b,
if abs(a-b) <= eps*abs(b)
= relative eps (cons b rest),
otherwise
relativesqrt a0 eps N = relative eps (repeat (next N) a0)
Numerical Differentiation
easydiff f x h = (f (x + h) - f (x)) / h
A problem: small h ⇒ small (f ( x + h ) - f (x)) ⇒ error
differentiate h0 f x = map (easydiff f x) (repeat halve h0)
halve x = x/2
within eps ( differentiate h0 f x )
(1)
But the sequence of approximations converges fairly slowly
elimerror n (cons a (cons b rest)) =
= cons ((b*(2**n)-a)/(2**n-1)) (elimerror n (cons b rest))
But n is unknown
order (cons a (cons b (cons c rest))) = round(log2((a-c)/(b-c)-1))
So a general function to improve a sequence of approximations is
improve s = elimerror (order s) s
More efficient variants:
within eps (improve (differentiate h0 f x))
(2)
n
≈ of
log
(ai+2 – and
ai)we
/ is
(athe
ai) term
– 1 B*h
) order
A isnthe
right
answer
B
error
. method
Using halve Let
property
the
obtain
the
fourth
2( sequence
i+1 –
n*hn and a(i+1) = A + B*(h**n).
Then a(i)
= A + B*2
within eps (improve
(improve
(improve
(differentiate h0 f x))))
(3)
Using the following A = (a(i+1)*(2**n) — a(i)) / 2**n – 1
super s = map second (repeat improve s)
second (cons a (cons b rest)) = b
Here repeat improve is used to get a sequence of more and more improved
sequences. So we obtain very difficult algorithm in easy manner
within eps (super (differentiate h0 f x))
(4)
FC++ library
• High order functions — functions with functional arguments
• FC++ library is a general framework for functional programming
• Polymorphic functions—passing them as arguments to other
functions and returning them as results.
• Support higher-order polymorphic operators like compose(): a
function that takes two functions as arguments and returns a
(possibly polymorphic) result
• Large part of the Haskell
• Support for lazy evaluation
• transforming FC++ data structures ⇔ data structures of the
C++ Standard Template Library (STL)
• operators for promoting normal functions into FC++ functoids.
Finally, the library supplies
• “indirect functoids”: run-time variables that can refer to
any functoid with a given monomorphic type signature.
“Lazy list” data structure in FC++
List<int> integers = enumFrom(1);
// infinite list of all the integers 1, 2,
List<int> evens = filter(even, integers); // infinite list of all the integers 2, 4,
bool prime( int x ) { ... }
// simple ordinary algorithm
filter( ptr_to_fun(&prime), integers );
// ptr_to_fun transform normal function to functoid
plus ( x, y )
⇒
x+y
plus ( 2 )
⇒
2+x
Limitations
– Lambda functions
– Dependences on context
(?)
– Possibility of reassignment for variables
(?)
– Template technique is insufficient (blitz++)
Memoization
Fibonacci example: F (n) = F (n-1) + F (n-2)
It is a classical bad case for imperative computations. Why?
Previous functional programs are easier for development and
understanding. Why?
Imperative scheme
Functional scheme
1. We need to call expressions explicitly
only
2. Procedure calls depend on the context
3. Strong sequence of computation units
(hard for parallelization)
4. If we want to memoize previous results
we should do this explicitly
5. Memoization process is controlled by
programmer
6. Only manual transforming to a suitable
scheme of data representation is
possible
7. Circle head and circle body are joined
monolithicly
8. Controlflow centric approach
9. Require a difficult technique for defuse chains analysis
1. Expressions do not depend on context
2. Context independent procedure calls
3. The sequence of computation units is
chosen by execution system (more
flexible for parallelization)
4. Automatic memoization
5. Memoization process is not controlled,
but filters are allowed (indirect control)
6. Stack technique of program
representation and execution is not
appropriate
7. Constructions like reduce and
composition of functions allow
considering circle head and body
independently
8. Dataflow centric approach
9. Suitable for def-use chains analysis
Lazy Evaluation in Boost/uBLAS:
Vectors/Matrices Expressions
Example: A + prod (B,V)
– Assignment activate evaluation
– Indexes define evaluation of expression tree:
if we write the example, we initiate the following computation
for all i { A [i] + ∑k (B [i,k] *V[k]) }
(1)
• “for all” means a compatibility of computations (order of computations is
chosen by the execution system)
• It is possible (but not necessary) to have the following representation:
A + [ ∑k (B [i,k] *V[k])] (”[ ]“denotes a vector constructor)
• Necessity of computation is defined by this fragment for each i (may be
dynamically)
• Postulate that “=“ operator always leads to appearance of necessity of
computations: its left hand side should be computed and assigned to the
right hand side:
D = A + prod (B,V);
• Common expression tree is used for computations for each i in (1)
– Types coordination is hold:
• Correct vectors/matrices expression includes constituents with types allowed
by operators of expression (including prod and others)
Boost/uBLAS: vectors/matrices expressions:
temporary and memoization problems
• Let us consider x = A*x expression (this is an error from the
functional style viewpoint, but correct for operational C++)
– Naive implementation is for all i { x [i] = ∑k (A [i,k] * x[k]); }
It is not correct!
– The suitable implementation should use a temporary t :
for all i { t [i] = ∑k (A [i,k] *x[k]); } x = t;
The last assignment should not be a copy of the value, but a reference
coping and deleting the previous value of x.
• Let us consider A * ( B * x).
– If we don’t use the lazy evaluation, we obtain an n2 complexity
(C1*n2 for B * x plus C2*n2 for other multiplication).
– But in a straightforward lazy case the obtained complexity would be n3.
• It is not a problem for a real functional language implementation
because of a value propagation technique (automatic memoization)
• Instead of the value propagation technique in C++ implementation,
we can provide a temporary. Its assignment breaks the expression
• We use temporaries in both cases. But what part of information
should be really saved?
Boost/uBLAS: style of programming
• Object-oriented style
– Standard technique and patterns should be used
• C++ style (as addition to the previous)
– Standard template technique
• Vectors/matrices expressions (as addition to the
previous)
– Tendency to use matrix and vector objects instead of
variables with indexes
– Tendency to write expressions instead of simple statements
– Use uBLAS primitives as specializations of general
templates
– Don’t use direct classes extensions by multilevel inheritance
Boost/uBLAS. Example task: Jacobi method
• Let us consider the Jacobi method of solving the linear
system
n
a
j 1
i, j
x j  bi
– We are able to write it using such formulas as
xi( k )  (bi   ai , j xi( k 1) ) / ai ,i
j i
– Instead of this, we should use it in matrix terms as
x(k )  D1 (L  U ) x(k 1) )  D1b
– As the result, we obtain the following program (next slide)
Boost/uBLAS. Example program Jacobi method
x(k )  D1 (L  U ) x(k 1) )  D1b
11
11
matrix<double> A (n, n);
vector<double> B (n);
U
U(A)
11
11
D
A..
For input data
*X=B
..
vector<double> X (n);
For output data
matrix<double> D (n, n);
Diagonal
L
L(A)
..
11
Example task in matrix terms
matrix<double> D_1 (n, n); Diagonal-1
triangular_adaptor<matrix<double>, unit_lower> L (A);
triangular_adaptor<matrix<double>, unit_upper> U (A);
identity_matrix<double> I(n,n);
D-1
Element of uBLAS data structure
…
D = A - L - U + 2*I;
Diagonal receiving
D_1 = inverse_matrix(D);
We need to write a special function!
for (i = 0; i < count; ++ i)
X = -prod(prod( D_1,L+U-2*I),X) + prod (D_1,B);
Boost/uBLAS style of programming in
comparison of indexes using
for ( k = 0; k < count; ++ k ) {
for (i = 0; i < A.size1 (); ++ i) {
T (i) = 0;
We force a temporary using!
for (j = 0; j < i; ++ j)
We force to divide the process off
T (i) += A (i, j) * Y (j);
for selecting an diagonal
for (j = i+1; j < A.size1 (); ++ j) activities with diagonal
T (i) += A (i, j) * Y (j);
-1 (This case may be better
Using
D
T (i) = ( B (i) – T (i) ) / A (i, i);
than vector/matrix expression)
}
Y = T;
}
• Obviousness is lost
• Sequence of computations is stated hard (losses of possibilities for
compiler’s optimization)
• It is harder for development of programs than in the alternative case
• Possibilities of indexes using are not lost in vectors/matrices
expressions
• Vectors/matrices expressions are more suitable to finding patters
needed for using special optimized external libraries
Boost/uBLAS: Gauss-Seidel method
Jacoby method:
x
(k )
~ ( k 1) ~
 Ax
b
x( k )  F ( x( k 1) , x( k ) )
xi( k )  (bi   ai , j x(jk )   ai , j x(jk 1) ) / ai ,i
j i
j i
X(k-1)
X(k)
X(k-1)
xi(k)
x( k )  F ( x( k 1) )
Gauss-Seidel method:
X(k)
xi(k)
x(k )  (D  L)1 (Ux(k 1)  b)
Gauss-Seidel method:
…
D_1 = inverse_matrix(D+L);
Jacoby method:
…
D_1 = inverse_matrix(D);
for (i = 0; i < count; ++ i)
for (i = 0; i < count; ++ i)
X = prod( D_1, B - prod (U, X));
X = - prod(prod( D_1,L + U-2*I),X) +
prod (D_1,B);
Recall x=Ax problem. It solves by
temporary: t=Ax; x=t. But Seidel method
may be consider as Jacoby method
when the temporary is avoided!
It is a new approach
to organizing of
computations!
Boost/uBLAS: style of programming (limitations)
• Frequently a rejection of the vectors/matrices style is required
• A problem of styles compatibility
• Not closed set of operators with matrices and vectors are presented
– Instead of vectors/matrices operator “*” one should use a generic
function prod (mv1, mv2) provided for all needed cases
– uBLAS Vectors and Matrices operators are not presented as an
algebraic system (for instance “-1”, “|| ||” are not offered because of the
problems of vectors/matrices expression lazy evaluation)
An acceptable approach may be proposed as follow:
– The results of “-1”, “|| ||” and so on computation are considered as
attributes of matrix and vector classes
– These attributes are computed out of the expression computation by
outside control if it is necessity
– Expression’s constituents with these operators are replaced by extraction
needed items from corresponded attributes
The direction to Vectors and Matrices expression is very promising
The myths about lazy evaluations
1. Lazy evaluations is possible
only in functional languages
Examples above indicate
that it is not right
2. Lazy evaluations and
functional languages may be
applied only in artificial
intelligence area
As we have seen it is not
right: high order functions
using is very prospective in
many cases
3. We are able to realize lazy
evaluations using an arbitrary
programming language
Language states a lot of
limitation for lazy
evaluations using*
4. Using lazy evaluations
decreases performance
This statement depends on
quality of algorithms
programming only
* C++
is subjected to criticism from this point of view by blitz++ project developers
Q&A