Transcript halt

Markov chain model of
machine code program
execution and halting
Riccardo Poli and Bill Langdon
Department of Computer Science
University of Essex
May 2006
R. Poli - University of Essex
Halting problem


Logic states that whether or not programs halt
is an undecidable problem (Turing)
Probability gives answer:
with probability 1, all programs without halt
instruction do not terminate (Langdon & Poli)
2
May 2006
R. Poli - University of Essex
Overview
 Memory
and loops make linear GP Turing
complete, but what is the effect search space
and fitness?
 T7 computer
 Experiments
 Markov chain model
 Implications
3
May 2006
R. Poli - University of Essex
Introduction


Without memory and loops distribution of
functionality for GP programs tends to a limit
as programs get bigger
True for Turing complete programs?
4
May 2006
R. Poli - University of Essex
T7 Minimal Turing Complete CPU


7 instructions
Arithmetic unit is ADD. From + all other
operations can be obtained. E.g.







Boolean logic
SUB, by adding complement
Multiply, by repeated addition (subroutines)
Conditional (Branch if oVerflow flag Set)
Move data in memory
Save and restore Program Counter (i.e. Jump)
Stop if reach end of program
5
May 2006
R. Poli - University of Essex
T7 Architecture
6
May 2006
R. Poli - University of Essex
Experiments






There are too many programs to test them all.
Instead we gather statistics on random samples.
Chose set of program lengths 30 to 16777215
Generate 1000 programs of each length
Run them from random start point with random
input
Program terminates if it obeys the last instruction
(which must not be a jump)
How many stop?
7
May 2006
R. Poli - University of Essex
Almost all T7 Programs Loop
8
May 2006
R. Poli - University of Essex
Model of Random Programs

Before any repeated instructions;



random sequence of instructions and
random contents of memory.
1 in 7 instructions is a jump to a random
location
9
May 2006
R. Poli - University of Essex
Model of Random Programs

T7 instruction set chosen to have little bias.





I.e. every state is ≈equally likely.
Overflow flag set half the time.
So 50% of conditional jumps BVS are active.
(1+0.5)/7 instructions takes program counter
to a random location.
Implies for long programs, lengths of
continuous instructions (i.e. without jumps)
follows a geometric distribution with mean
7/1.5=4.67
10
May 2006
R. Poli - University of Essex
Program segment = random code
ending with a random jump
11
May 2006
R. Poli - University of Essex
Forming Loops:Segments model





Segments model assumes whole program is
broken into N=L/4.67 segments of equal
length of continuous instructions.
Last instruction of each is a random jump.
By the end of each segment, memory is rerandomised.
Jump to any part of a segment, part of which
has already been run, will form a loop.
Jump to any part of the last segment will halt
the program.
12
May 2006
R. Poli - University of Essex
Probability of Halting

i segments run so far. Chance next segment
will




Chance of halting immediately after segment i
=

Form first loop = i/N
Halt program = 1/N
(so 1-(i+1)/N continues)
1/N × (1-2/N) (1-3/N) (1-4/N) … (1-i/N)
Total halting probability given by adding these
gives
≈ sqrt(π/2N) = O(N-½)
13
May 2006
R. Poli - University of Essex
14
Proportion of programs without loops
falls as 1/sqrt(length)
Segments model over,
but gives 1/√x scaling.
May 2006
R. Poli - University of Essex
15
Number of halting programs
rises exponentially with length
10100 000 000
May 2006
R. Poli - University of Essex
Average run time (non-looping)


Segments model allows us to compute a bound
for runtime
Expected run time grows as O(N½)
16
May 2006
R. Poli - University of Essex
17
Run time on terminating programs
Run time of nonlooping programs fits
Markov prediction.
Mean run time of all
terminating programs
length3/4
Max run time limited
by small,12 bytes,
memory becoming
non-random
Markov chain model
May 2006
R. Poli - University of Essex
States




State 0 = no instructions executed, yet
State i = i instructions but no loops have been
executed
Sink state = at least one loop was executed
Halt state = the last instruction has been
successfully executed and program counter
has gone beyond it.
19
May 2006
R. Poli - University of Essex
Event diagram for program execution 1/2
20
May 2006
R. Poli - University of Essex
Event diagram for program execution 2/2
21
May 2006
R. Poli - University of Essex
p1 = probability of being the last
instruction




Program execution starts from a random
position
Memory is randomly initialised and, so, any
jumps land at random locations
Then, the probability of being at the last
instruction in a program is independent of
how may (new) instructions have been
executed so far.
So,
22
May 2006
R. Poli - University of Essex
p2 = probability of instruction causing
a jump



We assume that we have two types of jumps

unconditional jumps (prob. puj, where PC is given a
value retrieved from memory or from a register

conditional jumps (prob. pcj)
Fag bit (which causes conditional jumps) is set
with probability pf
The total probability that the current instruction
will cause a jump is
23
May 2006
R. Poli - University of Essex
p3= probability of new instruction after
jump


Program counter after a jump is a random
number between 1 and L
So, the probability of finding a new
instruction is
24
May 2006
R. Poli - University of Essex
p4 = probability of new instruction
after non-jump



The more jumps we have executed the more
fragmented the map of visited instructions
will look.
So, we should expect p4 to decrease as a
function of the number of jumps/fragments.
Expected number of fragments (jumps) in a
program having reached state i
25
May 2006


R. Poli - University of Essex
Each block will be preceded by at least one
unvisited instruction
So, the probability of a previously executed
instruction after a non-jump is
and
26
May 2006


R. Poli - University of Essex
A more precise model considers the
probability of blocks being contiguous.
Expected number of actual blocks
hence
27
May 2006
R. Poli - University of Essex
State transition probabilities

These are obtained by adding up “paths” in
the program execution event diagram
E.g. looping probability
28
May 2006
R. Poli - University of Essex
Less than L-1 instructions visited
29
May 2006
R. Poli - University of Essex
L-1 instructions visited
30
May 2006
R. Poli - University of Essex
31
Transition matrix
For example, for T7 and L = 7 we obtain
loop
halt
6 instructions
5 instructions
4 instructions
3 instructions
2 instructions
1 instructions
0 instructions
1 instructions
2 instructions
3 instructions
4 instructions
5 instructions
6 instructions
loop
halt
0 instructions

May 2006
R. Poli - University of Essex
Computing future state
probabilities

All is required is to take appropriate powers
of the Markov matrix M
32
May 2006
R. Poli - University of Essex
Examples
For T7, L=7 and i=3
prob. looping in
3 instructions
prob. halting in
3 instructions
For T7, L=7 and i=L
total halting
probability
33
May 2006
R. Poli - University of Essex
Efficiency


Computing halting probabilities requires a
potentially exponentially explosive
computation to perform (ML)
We reordered calculations to obtain very
efficient models which allow us to compute
halting probabilities and
 expected number of instructions executed by
halting programs
for L = 10,000,000 or more (see paper for details)

34
May 2006
R. Poli - University of Essex
A good model?
Halting probability
35
May 2006
R. Poli - University of Essex
Instructions executed by halting programs
36
May 2006
R. Poli - University of Essex
Improved model accounting for
memory correlation
37
May 2006
R. Poli - University of Essex
Search space characterisation



From earlier work we know that for halting
programs, as the number of instructions executed
grows, functionality approaches a limiting
distribution.
The expected number of instructions actually
executed by halting Turing complete programs
indicates how close the distribution is to the limit.
E.g. for T7, very long programs have a tiny subset
of their instructions executed (e.g., 1,000
instructions in programs of L = 1,000,000).
38
May 2006
R. Poli - University of Essex
Effective population size



Often programs that do not terminate are
wasted fitness evaluations and are given zero
fitness
The initial population is composed of random
programs of which only a fraction p(halt) are
expected to halt and so have fitness > 0.
We can use the Markov model to predict the
effective population size Popsize × p(halt)
39
May 2006
R. Poli - University of Essex
40
Controlling p(halt) by varying jump
probability or program length
L=10
L=100
L=1000
L=10000
May 2006
R. Poli - University of Essex
Aborting non-terminating programs


The model can also be used to decide after
how many instructions to abort evaluation
time limit = m × expected instructions in
halting programs
The GP runtime (at generation 0)
41
May 2006
R. Poli - University of Essex
Conclusions




Experiment show that halting probability scales as
1/sqrt(length)
Markov chain model of program execution (and
halting) is practical and accurate
The halting probability  0 with length, so…
with probability 1, a program does not halt
However, Turing complete GP possible if
appropriate parameter settings and/or fitness
functions are used.
42