Presentation - Georgia Tech Savannah

Download Report

Transcript Presentation - Georgia Tech Savannah

Checking Computation of Numerical
Functions by the Use of Functional
Equations
REC 2006
NSF Workshop on Reliable Engineering Computing
F. Vainstein and C. Jones
Presentation Summary

Background
–
–
–


Theory
Finding checking polynomials
–
–
–
–

Fault tolerance
Computing
Numerical Functions
The general method
A program developed by this research
Some examples
Considerations for deployment
Future directions
Fault Tolerance
Grace in response to the unexpected





Withstands failures
Exhibits desirable behavior
Does not endanger life (military, transportation, medical)
Preserves scientific investment (space, supercomputing)
Meets consumer expectations
Fault Tolerance Can Be Critical
Military: Global Hawk
Science: Gravity Probe B
Exploration: Mars Opportunity Rover
Civilian: Airbus A380
Methods for Fault Tolerance

Modular redundancy
–

Replication with voting
–

Duplicate function blocks and compare for “majority” wins
Error-correcting codes
–

Back up systems in the event primary unit fails
Reed-Solomon, parity checks, …
Algorithm-based fault tolerance (ABFT)
- Encodes data and augments algorithm to detect errors
A Complex System: The Space Shuttle
Total number of parts > 600,000
Total Weight = 4,500,000 pounds
Cost to move one pound of cargo = $20,000
Budget = $3.3 billion / year
Modular Redundancy: Space Shuttle
Space shuttle avionics from Redundancy Management Techniques for Space
Shuttle Computers, Sklaroff, IBM Research Development, 1976.
Replication With Voting: Space Shuttle
Complex System: The Microprocessor
Intel Pentium 4 Prescott Core
Number of transistors > 125 million
Transistor size = 90nm
Pipeline = 31 stages
Development Budget = $4.2 billion/year
“Never in the history of mankind has it been possible to produce
so many wrong answers so quickly.” Carl-Erik Froeberg
What Does a Microprocessor Do?
• ALU: Arithmetic logic unit performs
math and logic functions.
•Math coprocessors were big business
for Intel and others in the 1980s.
Today, most processors incorporate
a math coprocessor or emulator for
numerical calculations.
• Move data from one memory location
to another
• Make decisions and jump to new
set of instructions
IBM FPU Core
Scientific codes typically spend much
of their time in common numerical
subroutines - about 70% of a phase
retrieval application, for example, is
spent in the Fast Fourier Transform
alone.
M. Turmon, Annual Report for FY 2001
Final Report Algorithm-Based Fault
Image Legend:
Tolerance, Nasa-JPL, Remote Exploration
Dark Blue: Interface, Decode and Issue
and Experimentation Project.
Pink: Pipe Management and Data Forwarding
Yellow: Arithmetic Pipe
Aqua: Load/Store Pipe
Numerical Functions
Numbers from numbers
Absolute value
Minimum
Maximum
Round to next integer
Return the fractional part of a value
Clip in a saturation fashion
Wrapping for integers
Log
Fast Fourier Transform (FFT)
Numerical Differentiation
Kalman Filtering
Degrees to radians
Cosine
Hyperbolic Sine
ArcSine
SINC function
Next positive power of 2
Linear interpolation
Root finding
Gaussian
Mod
Greatest Common Divisor
Numerical Functions in Action: 1
IMAGE PROCESSING
The FIDO Mars Exploration Rover (MER)
relies on detailed panoramic views in its
operation for near real-time tasks:
• Determination of exact location
• Navigation
• Science target identification
• Mapping
WEATHER MODELING
Roe, K., et al., High Resolution Weather Monitoring
for Improved Fire Management, 2001, Maui HPCC
• Real-time analysis of environmental information
for prediction of fire behavior
Numerical Functions in Action: 2
NON-LINEAR CONTROL SYSTEMS
Brennan, S., Integrated Chassis
Control for Vehicles, 2000
SCIENTIFIC SUPERCOMPUTING
U. Landman, et al., Large-scale classical
molecular dynamics, 2001, Georgia Tech
Background Summary:

Computing is at the heart of most modern systems

Fault tolerance is a concern – especially for mission
and safety critical systems

The computation of numerical functions is a critical area
of computing
Notable Work in Numerical Result Checking
M. Blum, R. Rubinfeld
- Self-Testing/Correcting with Application to Numerical
Problems, 1990
M. Blum, H. Wasserman
- Reflections on the Pentium Division Bug, 1995,
- Software Reliability Via Runtime Result Checking, 1997
• Promoted numerical checking
• A motivation for result checking
Used functional equations but no general method existed.
An Algebraic Method for Fault Tolerance
1991 – Feodor Vainstein, Georgia Tech
Error Detection and Correction in Numerical
Computations by Algebraic Methods
Developed a general theory for generating functional equations.
Showed that many numerical functions have functional
equations and that computations of such numerical functions
could be verified by checking polynomials – a novel technique
based upon algebraic concepts such as the transcendental degree
of field extensions.
Contribution of This Work:
A Method for Practical Numerical Checking
• Developed software method for finding checking
polynomials.
• Treated the case of functions that are not polynomially
checkable.
• User-friendly program for hardware/software engineering
• Design considerations
Polynomial Numerical Checking Example: 1
Polynomial Numerical Checking Example: 2
Polynomial Numerical Checking Example: 3
Polynomial Numerical Checking Example: 4
Algebra*: Fields
*S. Lang, Algebra, Addison-Wesley, 1965
Algebra: Algebraically Dependent
Algebra: Transcendental Degree of Field
Extension
Algebra: Algebraically Closed and Algebraic Closure
Algebra: Linear Independence
Theory: Polynomially Checkable
Theorem:
Theory: Example and Generality
Theory: Linearly Checkable
Theory: Other Cases
We also considered
 Functions over various fields
 PC and LC functions of several variables
 Partially polynomially checkable functions
The focus of the present work is on finding a practical method
for determining approximate checking polynomials for PC
and non-PC functions for real-valued functions of a single
variable.
Least Squares Estimation
The least squares estimation technique is used to compute
estimations of parameters and to fit data.
Since some functions are not PC we can generalize to
approximate for non-PC functions.
There are other methods but this was chosen to
• Add robustness
• Develop a practical process
• Treat all polynomially checkable functions
Application of Least Squares Estimation: 1
The problem of finding a checking polynomial can be reduced to the
following optimization problem.
Let
B
  0 ,  1 , ,  k     f x    1 f x  a1      k f x  a k    0 2 dx
A
Application of Least Squares Estimation: 2
Application of Least Squares Estimation: 3
Application of Least Squares Estimation: 4
Software Implementation of Least Squares Estimation:
1
Solve the matrix equation:
AX  B
Software Implementation of Least Squares Estimation:
2
The coefficients of the checking
polynomial are then in vector X:
Those values can be used to find
the value of the delta function:
Deviation shows how
good is our approximation
The Matlab Function:
• Solves the least squares estimation problem
• Finds the delta function value for a range of k
• Returns the checking polynomial coefficients
for the best (smallest error) delta function
• Plots the error over the function domain for
the best delta
• Plots deviation for a range of k
• Simulink, DSP Builder generates VHDL and
deploys to Altera FPGA (Xilinx similar)
Function Input
Function Output
B
  0 ,  1 , ,  k     f x    1 f x  a1      k f x  a k    0 2 dx
A
Example: SINE Function Output
Example: SINE Function Plots
The sine function
is linearly checkable
(LC)
The Logarithm Function: Output
The Logarithm Function: Plots
The Logarithm Function: k = [1…40]
Checking Polynomials: Simple Functions
Checking Polynomials: Compound Functions
Why Matlab
Matlab (MATrix LABoratory)
• Matrix-oriented programming environment
• Code can compile to C/C++
• Built-in routines for data analysis and visualization
• GUI/Web publishing support
• A popular environment for technical computing
http://www.gtrep.gatech.edu/undergradlabs/labman/CheckingPolynomial
Deployment: Considerations
• Hardware or software
• Pipeline or parallel
• If non-LC function returns high-order checking polynomial
 Break up function domain
 Generate separate checking polynomial for each
sub-interval
Simulink Design
[k,delta,alphas,betao,stepsize,A,B]=LSEFUNRUN('exp(x).*sin(x)',10^-4,0,3.1415,(1:2))
k
delta
• We show a Simulink example
• Extension of Matlab
• Modeling, simulating
• GUI environment
• Toolboxes for DSP, etc
• Toolboxes for targeting
FPGA devices
alphas
beta
Simulink Implementation of Checking
Algorithm
f x   e x sin( x)
Space Complexity
For a ROM implementation that stores b-bit
numbers and has m address lines.
Error Coverage
Error Coverage Example
This is the percentage of all errors covered.
Design Flow
Define Numerical Function
Define Domain of Function
Based on System Bit Size,
Accuracy of Instrumentation, etc…
Use LSE Function
to Find Checking
Polynomial Coefficients
Based on LSE Results Choose Appropriate
Number of Shifted Functions
Choose Hardware or
Software Implementation
Parallel or Pipeline
Target Markets
Numerically intense, safety, or mission critical
 Supercomputing
 Moletronics and nanosystems
 Space or remote systems
 Control systems using COTS components
Example: NASA Seeks COTS Remote Supercomputing
Space Radiation
S. Kayali, Space Radiation Effects on Microelectronics, Radiation Effects Group,
JPL, Section 514.
Traditional Fault Tolerant Devices are Costly in Terms
of Design Space, Time, and Money
Perry COTS initiative
• Buy more commercial products
• Use industrial specifications
• Reduce costs
William J. Perry, Specifications and Standards – A New Way of Doing
Business, Memorandum, 1994
Radiation-Hardened Half-Micron CMOS 16K
SRAM, Sandia National Laboratories
Moletronics, CMOL, and Nanodevices Will Require
Minimizing Fault Tolerant Strategies
Low Yield and structural defects will be
considerable (in moletronic devices). Hence, the
target architecture has to be inherently faulttolerant/configurable. If you want to compensate
for the errors then you have to use errorcorrecting codes and fault-tolerant circuits.
V. Roychowdhury, A Quest for Information,
Frontiers in Nanocomputing Seminar, 2004
Single molecular implementation of single-electron
transistor, K Likharev, Electronics Below 10nm, 2003
Demands for Numerical Fault Tolerant
Computing
Shrinking
Devices
Numerical
Fault
Tolerance
Remote
Autonomous
COTS
(Cost)
Numerical Checking Only Part of the Solution:
Complex Systems Require Multiple Fault Tolerant
Strategies
Conclusions and Future Directions

Remaining Tasks
–
–
–
–

Tame functional discontinuities
Deploy to hardware/software testbed
Investigate impact of single and multiple checking polynomial
strategies
Investigate best interface strategies
Develop Numerical Checking Toolbox
–
–
Functions of several variables
Partially polynomially checkable functions
Thank You!