Transcript N-1

What is the correct number
of break points
hidden in a climate record?
DipDocSem – 12. December 2011
Defining breaks
Consider the differences of one station
compared to a reference. (Kriged
ensemble of surrounding stations)
Breaks are defined by abrupt changes in
the station-reference time series.
Internal variance
within the subperiods
External variance
between the means of different
subperiods
Criterion:
Maximum external variance attained by
a minimum number of breaks
DipDocSem – 12. December 2011
Decomposition of Variance
n total number of years
N subperiods
ni years within a subperiod
The sum of external and
internal variance is constant.
DipDocSem – 12. December 2011
Two questions
Titel of this talk asks: How many breaks?
Where are they situated?
Testing of all permutions is not feasible.
 n  1  99  99  98  97  96  95  94  93  92  91  90

    
 1013
1  2  3  4  5  6  7  8  9  10
 k   10 
The best solution for a fixed number of breaks
can be found by Dynamical Programming
DipDocSem – 12. December 2011
Dynamical Programming (1)
Find the optimum positions for a
fixed number of breaks.
Consider not only the complete
time series, but all possible
truncated variants.
DipDocSem – 12. December 2011
Dynamical Programming (2)
Find the optimum positions for a
fixed number of breaks.
Consider not only the complete
time series, but all possible
truncated variants.
Find the first break by simply
testing all permutions.
DipDocSem – 12. December 2011
Dynamical Programming (3)
Find the optimum positions for a
fixed number of breaks.
Consider not only the complete
time series, but all possible
truncated variants.
Find the first break by simply
testing all permutions.
Fill up all truncated variants. The
internal variance consists now
of two parts: that of the
truncated variant plus that of
the rest.
DipDocSem – 12. December 2011
Dynamical Programming (4)
Find the optimum positions for a
fixed number of breaks.
Consider not only the complete
time series, but all possible
truncated variants.
Find the first break by simply
testing all permutions.
Fill up all truncated variants. The
internal variance consists of
two parts: that of the truncated
variant plus that of the rest.
Search the minimum out of n.
DipDocSem – 12. December 2011
Dynamical Programming (5)
The 2-breaks optimum for the full
length is found.
To begin the search for 3 breaks,
we need as before the
previous solutions for all, also
shorter length.
This needs n2/2 searches, which is
for larger numbers of breaks k
much less than all
permutations (n over k).
DipDocSem – 12. December 2011
Position & Number
Solved:
The optimum positions for a fixed number of breaks are known
by Dynamical Programming.
Left:
Find the optimum number of breaks.
The external variance increase in any case with increasing
number of breaks.
Use as benchmark the behaviour of a random time series.
DipDocSem – 12. December 2011
Segment averages
with stddev = 1
Segment averages xi scatter randomly
mean :
0
stddev:
1/ ni
Because any deviation from zero can be
seen as inaccuracy due to the limited
number of members.
DipDocSem – 12. December 2011
External Variance
Weighted measure for the
variability of the subperiods‘
means
The external variance
is equal to the mean square sum
of a random normal distributed variable.
DipDocSem – 12. December 2011
c2-distribution
n:
k:
N = k+1:
[ ]:
Length of time series (Number of years)
Number of breaks
Number of subsegments
Mean of several break position permutations
[varext] = (N-1)/n = k/n
In average, the external variance increases linearly with k.
However, we consider the best member as found by DP.
varext ~ cN2
The external variance is chi2-distributed.
Def.:
Take N values out of N (0,1), square and add them up.
By repeating a cN2-distribution is obtained.
DipDocSem – 12. December 2011
21-years random data (1)
1000 random time series are
created.
Only 21-years long, so that
explicite tests of all
permutations are possible.
The mean increases linearly.
However, the maximum is relevant
(the best solution as found by DP)
Can we describe this function?
First guess:
1 v 
1  k 
* 4
DipDocSem – 12. December 2011
21-years random data (2)
Above, we expected the data
for a fixed number of breaks
being chi2-distributed.
DipDocSem – 12. December 2011
From c2 to b distribution
The random data does not fit
exactly to a chi2-distribution.
n = 21 years
k = 7 breaks
The reason is that chi2 has no
upper bounds.
data
But varext cannot exceed 1.
A kind of confined chi2 is the beta
distribution.
c2
DipDocSem – 12. December 2011
From c2 to b distribution
X ~ c2(a) and Y ~ c2(b) 
X / (X+Y) ~ b(a/2, b/2)
n = 21 years
k = 7 breaks
data
If we normalize a chi2-distributed
variable by the sum of itself and
another chi2-distributed variable,
the result will be b-distributed.
b
c2
The b-distribution fits well to the data
and is the theoretical distribution
for the external variance of all
break position permutations.
DipDocSem – 12. December 2011
From c2 to b distribution
17
b
7
p (v ) 
v
k
1
2
1  v 
 k n 1 k 
B ,

2
2


15
11
with
c2
n 1 k
1
2
B ( a, b) 
(a) (b)
 ( a  b)
We are interested in the best solution,
with the highest external variance,
as provided by DP.
We need the exceeding probability for
high varext
DipDocSem – 12. December 2011
Incomplete Beta Function
External variance v is b-distributed
and depends on n (years) and k (breaks):
p (v ) 
v
k
1
2
1  v 
n 1 k
1
2
 k n 1 k 
B ,

2
2

v
The exceeding probability P gives the
best (maximum) solution for v
P (v)  1   pdv
0


Incomplete Beta Function
Solvable for even k and odd n:
i
k
2
m
n3
2
 m l
m l
P(v)     v 1  v 
l 0  l 
i 1
DipDocSem – 12. December 2011
Example 21 years, 4 breaks
 m l
m l
P(v)     v 1  v 
l 0  l 
i 1
i
k
2
m
n3
2
k=4 i=2
n = 21  m = 9
P(v)  1  v   9v 1  v 
9
8
DipDocSem – 12. December 2011
Theory and Data
Theory (Curve):
P(v)  1  v   9v 1  v 
9
8
Random data (hached) fits well.
DipDocSem – 12. December 2011
Nominal Combination Number
For n = 21 and k = 4 there are
 n  1  20 

     4845
k

 4
break combinations.
If they all were independent we
could read the maximum external
variance at (4845)-1 ≈ 0.0002
being 0.7350
However, we suspect that the
break combinations are not
independent. And we know the
correct value of varext.
DipDocSem – 12. December 2011
Effective and Nominal
Remember:
varext= 0.5876 for k=4
The reverse reading leads to an
23 times higher exceeding probability.
This shows that the break permutations
are strongly dependent and the effective
number of combinations is smaller than
the nominal.
However, the theorectical function is
correct.
DipDocSem – 12. December 2011
From 21 years to 101 years
2 breaks
20
As we now know the theoretical
function, we quit the explicit
check by random data.
And skip from unrealistic short time
series (n=21) to more realistic
(n=101).
Again the numerical values of the
external variance is known and
we can conclude the effective
combination numbers.
dv
Can we give a formula for
dk
in order to derive v(k) ?
DipDocSem – 12. December 2011
dv/dk sketch
Increasing the break number from k
to k+1 has two consequences:
1.
2.
The probability function
changes.
The number combinations
increase.
Both increase the external variance.
DipDocSem – 12. December 2011
Using the Slope
P(v) is a complicated function and hard to
invert into v(P).
 m
m l
P(v)     v l 1  v 
l 0  l 
i 1
Thus, dv is concluded from dP / slope.
We just derived P(v) by integrating p(v),
so that the slope p(v) is known.
DipDocSem – 12. December 2011
The Slope
d
ln( P(v))    p(v)
dv
P (v )
Insert the known functions:
d
ln( P(v))   
dv
v i 1 1  v 
Reduce and replace m and i:
d
ln( P(v))   
dv
m  i  1
m 

 i  1
 m
  l  v 1  v 
i 1
l
l 0
The last summand dominates:
m i
m l
 
v i 1 1  v 
m i
m  i  1
m 

 i  1
 m  i 1
m i 1

 v 1  v 
 i  1
d
ln( P(v))    m  i  1
dv
1 v
d
ln( P(v))    n  1  k
dv
2 1  v 
DipDocSem – 12. December 2011
Distance between the Curves
ln Pi 1   ln Pi 
 i  m l
    v 1  v m l
 l 0  l 
 ln  i 1
m
    v l 1  v m l
 l
 l 0  







The last summand dominates:
Reduce and replace m and i:
ln Pk 1   ln Pk  
1  n  1  k  v
ln 
2  k 1  v 
ln Pi 1   ln Pi 
  m i

   v 1  v mi 
i


 ln   

m  i 1
m i 1
 




v
1

v
  i  1





 m  i  1 v
ln Pi 1   ln Pi   ln 
 i 1  v 






DipDocSem – 12. December 2011
Effective combination growth
 n  1


 k 
(n-1-k) / k
 n  1


 k  1
Nominal Growth Rate
-2 ln ( (n-1- k) / k)
Ln:
minus:
2:
Logarithmic sketch
Number of combinations is reciprocal to Exceeding Probability
Exceeding Probability only known for even break numbers
However, break combinations are not independent
and we know the effective number of combinations
DipDocSem – 12. December 2011
Ratio: nominal / effective
k1
k2
k
nominal
effectiv
c=nom/eff
2
4
3
-2.552
-7.784
0.328
4
6
5
-2.186
-6.952
0.315
6
8
7
-1.963
-6.356
0.309
8
10
9
-1.765
-5.889
0.300
10
12
11
-1.645
-5.503
0.299
12
14
13
-1.514
-5.173
0.293
14
16
15
-1.435
-4.885
0.294
16
18
17
-1.363
-4.627
0.295
18
20
19
-1.292
-4.394
0.295
The ratio of nominal / effective is
approximatly constant with c = 0.3
DipDocSem – 12. December 2011
Very Rough Solution
dv
dk

2 1  v 
n 1 k

 n 1 k
 c ln 


k


 1  n  1  k  v
  ln 
 2  k 1  v 

 

*
Normalisation k 
1  k * dv
1  v dk *
1 k* 
 (1  k * )v 
  : 
 2c ln  *   ln  *
k 
1  v) 

k(



n
4
 2  0.3  ln( 100)  ln( 4)
 2.76  1.39 
k
n 1
for small k*
for n = 100
4.15
DipDocSem – 12. December 2011
The Two Contributions
1  k * dv
1  v dk *
1 k* 
 (1  k * )v 
 : 
 2c ln  *   ln  *
k 
k (1  v) 





n
4
DipDocSem – 12. December 2011
Exact Solution
1  k * dv
1  v dk
1
dv 
1 v
1 v 

1 k * 1 k *
ln  *
2
 k
 1 1 k *
 ln  *
2  k


1  k 

  2 ln 5

 2 ln 5  *
 dk
 
* 
 1 k 
1
* 2 ln(5 )  2
1 k
 *
 k
*



1
 k*
2
DipDocSem – 12. December 2011
Constance of Solution
101 years
The solution for the exponent 
is constant for different length of
time series (21 and 101 years).
21 years
DipDocSem – 12. December 2011
The extisting algorithm Prodige
 k 1
2 
  n j (Y j  Y ) 


j 1
C k (Y )  ln 1  n
 
2

(Yi  Y ) 



i 1
ln 1  v  
2 k  l 
ln( n)  min
n 1
2k
ln n   min
n 1
Original formulation of Caussinus &
Lyazrhi for the penalty term as
adopted by Mestre for Prodige
Translation into terms used by us.
ln 1  v   2k * ln n   min
Normalisation by k* = k / (n -1)
 2 ln n   0
Derivation to get the minimum

1 dv
1  v dk *
1 dv
1  v dk *
 2 ln n 
In Prodige it is postulated that the
relative gain of external variance
is a constant for given n.
DipDocSem – 12. December 2011
Our Results vs Prodige
Exceeding probability
1/128
1/64
1/32
1/16
1/8
1/4
We know the function for the relative
gain of external variance.
Its uncertainty as given by isolines
of exceeding probabilities for 2-i
are characterised by constant
distances.
Caussinus and Lyazrhi (adopted by
Mestre) propose just a constant
of 2 ln(n) ≈ 9
DipDocSem – 12. December 2011
Wrong Direction
n = 101 years
n = 21 years
DipDocSem – 12. December 2011
Conclusion
We have found a general mathematical formulation how the external
variance of a random time series is increasing when more and
more breaks as given by Dynamical Programming are inserted.
This is much more accurate than existing estimations and can be
used in future as benchmark to define the optimum number of
breaks.
DipDocSem – 12. December 2011
Integrated result
How does the found function look
like after integration?
Crosses: Test data
Line:
Theory
Error bars: 90 and 95 percentile
DipDocSem – 12. December 2011
Appendix (1)
f
 m
  v
 li 

 m 

 1  v 
 li  1
i 1 m
 
m l
Consider the individual summands of the sum as defined in P( x)     v l 1  v 
l 0  l 
The factor of change f between a certain summand and its successor is:
where li runs from zero to i. The ratio of consecutive binomial coefficients
can be replaced and it follows:
f

m  li  1 v
li 1  v 
m and i can be replaced by n and k:
f

n  1  lk  v
l k 1  v 
inserting k instead of lk is a lower limit for f because (n-1-lk)/lk, the rate of
change of the binomial coefficients, is decreasing monotonously with k:
f

n  1  k  v
k 1  v 
normalised by 1/(n-1):
f

1  k  v
*
k * 1  v 
DipDocSem – 12. December 2011
Appendix (2)
the approximate solution is known with 1-v = (1- k*)4
f
f
k 0
k 1
We can conclude that each element of the sum given
above is by a factor f larger than the prior element.
For small k* the factor f is greater than about 4 and
grows to infinity for large k*. Consequently, we can
approximate the sum by its last summand according to:

f
* 4
*
* 4
*
 
1  k 
1 1 k*


f
1  k  1  1  k  
k 1  k 
4
* 3
k*




1  1  4k *
k * 1  3k *

P( x) 

4k *
k * 1  3k *

1  k   1  k 
* 3
k
* 4
*
 m
  l  v 1  v 
i 1
l
l i
 
m l



4
1  3k *


 4
0
 
1
 m  i 1
 v 1  v mi 1
 
 i  1
DipDocSem – 12. December 2011
Application (1)
Insert in each of 1000 random time
series 5 breaks of variance 1.
The change of external variance for
low break numbers (1, 2, 3 up to
about 10) increase.
Lying above the theoretical function
for random time series without any
break (arrow).
Variances of break numbers higher
than 5 increase, because the
inserted 5 breaks are not always
the biggest.
DipDocSem – 12. December 2011
Application (2)
Stop break search, when the growth rate for
the external variance drops firstly below
the theoretical one for zero breaks.
1 Example of 1000 test time series
Crosses: Observations
Thin line: Inserted breaks
Fat line: Detected breaks
In average over 1000 samples:
Added variance:
86%
(theoretically 5/6)
Remaining after correction:
27%
Average detected break number 5.48
DipDocSem – 12. December 2011