Validating a Random Number Generator

Download Report

Transcript Validating a Random Number Generator

Validating a Random Number
Generator
Based on:
A Test of Randomness Based on the Consecutive
Distance Between Random Number Pairs
By:
Matthew J. Duggan, John H. Drew, Lawrence M. Leemis
Presented By:
Sarah Daugherty
MSIM 852
Fall 2007
Introduction

Random numbers are critical to Monte
Carlo simulation, discrete event simulation,
and bootstrapping

There is a need for RNG with good
statistical properties.

One of the most popular methods for
generating random numbers in a computer
program is a Lehmer RNG.
Daugherty MSIM 852 Fall 2007
Lehmer Random Number Generators

Lehmer’s algorithm: an iterative equation
produces a stream of random numbers.


Requires 3 inputs: m, a, and x0.




xi 1  axi mod m
m = modulus, a large fixed prime number
a = multiplier, a fixed positive integer < m
x0 = initial seed, a positive integer < m
Produces integers in the range (1, m-1)
Daugherty MSIM 852 Fall 2007
Problem

Lehmer RNG are not truly random

With carefully chosen m and a, it’s possible to
generate output that is “random enough” from a
statistical point of view.

However, still considered good generators
because their output can be replicated,
they’re portable, efficient, and thoroughly
documented.

Marsaglia (1968) discovered too much
regularity in Lehmer RNG’s.
Daugherty MSIM 852 Fall 2007
Marsaglia’s Discovery

He observed a lattice structure when consecutive
random numbers were plotted as overlapping ordered
pairs.

((x0, x1, x2,…, xn),
(x1, x2,…, xn+1))

Lattice created using
m = 401, a = 23.

Does not appear to be
random at all; BUT a
degree of randomness
MAY be hidden in it.
Daugherty MSIM 852 Fall 2007
Solution

Find the hidden randomness in the order in
which the points are generated.

The observed distribution of the distance
between consecutive RN’s should be close
to the theoretical distance.

Develop a test based on these distances.

Hoping to observe that points generally are not
generated in order along a plane or in a regular
pattern between planes.
Daugherty MSIM 852 Fall 2007
Overlapping vs. Non-overlapping Pairs

Considering distance between consecutive
pairs of random numbers, points can be
overlapping or non-overlapping.


Overlapping: (xi, xi+1), (xi+1, xi+2)
Non-overlapping: (xi, xi+1), (xi+2, xi+3)

Both approaches are valid.

The non-overlapping case is mathematically
easier in that the 4 numbers represented
are independent therefore the 2 points they
represent are also independent.
Daugherty MSIM 852 Fall 2007
Non-overlapping Theoretical Distribution

If we assume X1, X2, X3, X4 are IID U(0,1)
random variables, we can find the
distance between (X1, X2) and (X3, X4) by:
D  ( X1  X 3 )  ( X 2  X 4 )
2
2
Daugherty MSIM 852 Fall 2007
Non-overlapping Theoretical Distribution

The cumulative
distribution,
F(x), of D.
Daugherty MSIM 852 Fall 2007
Goodness-of-Fit Test

Now we can compare our theoretical distribution
against the Lehmer generator.

Convert the distances between points into an
^
empirical distribution, F(x), which will allow us to
perform a hypothesis test.
N ( x)
F ( x) 
n
^
N(x) = # of values that do
not exceed x
^
H 0 : F ( x)  F0 ( x)
^
H1 : F ( x)  F0 ( x)
n = # of distances collected
Daugherty MSIM 852 Fall 2007
Classification of Results

Based on results of 3 hypothesis tests (KS,
CVM, and AD tests), each RNG can be
classified as:

Good – the null hypothesis was not rejected in
any test.

Suspect – the null hypothesis was rejected in 1
or 2 of the tests.

Bad – the null hypothesis was rejected in all 3
tests.
Daugherty MSIM 852 Fall 2007
Results

Interesting cases are when a multiplier is
rejected by only 1 or 2 of the 3 tests. See
a = 3 in table.
Daugherty MSIM 852 Fall 2007
Random
number
pairs
Distances
connecting
pairs
F(x)
(solid) vs.
^
Good
Suspect
Bad
F(x)
(dotted)
Daugherty MSIM 852 Fall 2007
Summary

A test of randomness was developed for
Lehmer RNG’s based on distance between
consecutive pairs of random numbers.

Since some multipliers are rejected by
only one or two of the 3 hypothesis tests,
the distance between parallel hyperplanes
should not be used as the only basis for a
test of randomness. The order in which
pairs are generated is a second factor to
consider.
Daugherty MSIM 852 Fall 2007
Critique

Potential – limited. Many other tests exist for
validating RNG’s.

Impact – minimal. Frequently used RNG’s use a
modulus much larger than the m=401 used here.

Overall – paper is well written; in it’s current state,
this test is a justified addition to collection of tests
for RNG’s.

Future – use larger modulus; improve theoretical
distribution by improving numerical calculations of
integral for cdf; test other non-Lehmer generators
such as additive linear, composite, or quadratic.
Daugherty MSIM 852 Fall 2007