FPGA-Optimised High-Quality Uniform Random Number Generators

Download Report

Transcript FPGA-Optimised High-Quality Uniform Random Number Generators

School of Engineering
University of Guelph
FPGA-Optimised High-Quality
Uniform Random Number
Generators
By David Barrie Thomas and Wayne Luk of
the Imperial College, London, England
-Paper ReviewCédric De Brito
Table of Contents
• Introduction
• Background
• The paper
• Explanations
• Conclusion
Introduction
• This paper describes a new method to create
Random Number Generators (RNG) using only
the most basic primitives of FPGAs: Flip-Flops
(FF), Lookup Tables (LUT), Shift Registers (SR)
and Random Access Memory (RAM)
What is a RNG ?
• Two types:
i. True RNG (TRNG):
of physical
phenomena,
ii. Use
Pseudo
RNG
(PRNG):e.g the jitter between oscillators
Background
 depends on initial seed

fundamentallyfunctions
unpredictable
 characteristic
are deterministic : a given
always
give rise to the
same output
 seed
greatmust
interest
in cryptographic
applications
sequence
X
cannot produce
high abit-rate
streams
X advances
state using
recurrence
function
X
state-space
must ahave
a fixed with
cardinality
-> must
X the
impossible
to repeat
simulation
the same
stimuli
eventually repeat
-> Quality ?
Background
Obstacles to
FPGA implementation
• Limited resources
• Most algorithms are developed for CPUs
->usually optimized for 32 bits
->mostly sequential
• FPGAs are fundamentally binary
• Parallel -> huge increase in bit rate
Background
What is currently
available
Paper
published in
2008
• Most commonly used in software: MersenneTwister RNG (MT19937)
– adapted to FPGA
– runs on an Opteron at 2.2GHz for 4Gb/s
• Most efficient on an FPGA: LUT recurrence
– MAXIMUM area efficiency
– no software equivalent
Motivations
1
2
3
Poor
Good
Best
Area
Throughput
Ease of
implementation
1
2
3
2
3
2
LUT recurrence
3
1
1
LUT - FIFO
3
4
2
Software
Software-based on
Reconfigurables
The paper
The paper
Motivations
“ Many scientific and industrial problems have no tractable
closed-form solution, and can only be solved through MonteCarlo simulations. However, such simulations require huge
amounts of computational power, and the power and size
limits of conventional compute-farms have led to significant
interest in the use of FPGAs in such applications.”
The paper
Claims
Compared to the
MT19937 at
2.2GHz for a
output of 4Gb/s
• Less area utilisation
– > less power
• Low clock frequency : 550 Mhz
• High speed generation : 48 Gb/s
On a
Virtex-5
The paper
Claims
• Extremely long periods: ~211213 -1
,758,426 e+3345
• VHDL description is platform independent
• By nature, its critical path is obvious
-> can easily be adapted/optimized for
any platform
Basics
• A RNG relies on a polynomial of degree n
to determine the values of the bits of the
next generated number.
n - a zn-1 +…+a z0
Pz = zExplanation
1
n
• This characteristic polynomial is translated
into a characteristic recurrence function
Si+1T = [si,5 XOR si,6 , si,6 , si,4 , si,5 , si,2 , si,0 , si,1 ]
Active bit
FIFO bits
Explanation
A simple example
Binary Linear Recurrence (BLR)
1) Choose a polynomial of the form
Px = x6 + x5 + x0
2) Find its equivalent recurrence function
Si+1T = [si,5 XOR si,6 , si,6 , si,1 , si,2 , si,3 , si,4 , si,5 ]
3) Create a matrix to save these coefficients
Explanation
A simple example
Binary Linear Recurrence (BLR)
• Only produces 1 new value per cycle
• For a w-length word, w SRs are needed.
• Maximum period of 2n-1 instead of 2nw-1
Explanation
How is the period
calculated ?
• Why? Because the state-space defining
the characteristic polynomial must be
finite/bounded -> must eventually repeat
• Without too much math, maximum
period is 2n-1 if (special but common case)
2n-1 is a Mersenne prime
Prime if n in 2n-1 is prime
Explanation
How is the quality
measured?
• The quality is a measure of “how random”
the numbers produced are.
• Test of quality :
• Run Generator twice with several
"batteries" (Diehard, SmallCrush, Alphabit)
and both runs must produce enough values
in [10-6 .. 1-10-6] and one in [0.01..0.99]
-> Empirical randomness
Conclusion
• Critical path : I -> FF -> RAM -> FF -> O
• Limitation : speed of RAM – NOT logic/routing
Personal Opinion
• VERY interesting paper
• Condensed but complete
• Clear
• Can be reproduced
• About a rather unusual topic
Questions ?