Perception based Reinforcement Learning (PRL)

Download Report

Transcript Perception based Reinforcement Learning (PRL)

Data Mining and Gated Expert
Neural Networks for Prognostic
of Systems Health Monitoring
Mo Jamshidi, Ph.D., DEgr., Dr. H.C.
F-IEEE, F-ASME, F-AAAS, F-NYAS, F-HAE, F-TWAS
Regents Professor, Electrical and Computer Engr. Department &
Director, Autonomous Control Engineering (ACE) Center
University of New Mexico, Albuquerque, NM, USA
Advisor, NASA JPL (1991-93), Headquarters (1996-2003)
Sr. Research Advisor, US AF Research Lab. (1984-90,2001-present)
Consultant, US DOE Oak Ridge NL (1988-92), Office of Renewable
Energy (2001-2003)
Vice President, IEEE Systems, Man and Cybernetics Society
[email protected]
Fairbanks, Alaska, USA May 24 2005
http://ace.unm.edu www.vlab.unm.edu
OUTLINE
Definition of Prognostics
History of Prognostics
Approaches of Prognostics
Principle Component Analysis – PCA
PCA via Neural Network Architecture
Prognostics via Neural Networks
Gated Approach to Hardware Prognostics
Applications – Health and Industry
Conclusion and Future Efforts
Prognostics vs. Diagnostics vs. Health
Monitoring – Are They the Same?
 Health Monitor: “ v: to keep track of [current
status] systematically with a view to collect
information.”
 Diagnosis: “n: identifying the nature or cause
of some phenomenon.”
 Prognosis: “n: a prediction about how
something (as the weather) will develop,
forecasting.”
Conclusion: they are not the same…
• The Webster’s New World Dictionary.
So How Are They Related?
Health monitoring uses instrumentation to
collect information about the subject system.
Diagnostics uses the information in real time
to detect abnormal operation or outright faults.
Prognostics uses the information to predict
the onset of abnormal conditions and faults
prior the actual failure to allow the operators to
gracefully plan for shutdown or, if required,
operate the system in a degraded but safe-touse mode until a shutdown and maintenance
can be accomplished.
A Brief History of Automated Diagnostics
and Prognostics
 Before the advent of inexpensive computing, diagnosis was
ad-hoc, manual, and depended on human experts.
With the advent of accessible digital computers, early
expert systems attempt diesel locomotive engine
diagnostics based on oil analysis. Humans still
required for prognostics.
 1970’s saw the start of equipment health monitoring for highvalue systems (i.e. nuclear power plants) and on-line
diagnostics using minicomputers. Human interpretation was
still required.
1980’s saw the use of personal computers and digital
analyzers to do equipment health monitoring. Some
automatic shut-down on extreme exception was
included, but human involvement was still required.
A Brief History (Contd.)
1990’s saw built-in test and real-time
diagnostics added to military electronics
and high-value civilian systems. Health
monitoring/diagnostics at this point were
evolving into decision support systems
for the operator.
NOW – Diagnostics pervasive
Automobiles (On Star ™, OBD II,
heavy equipment, trucks, etc.)
Electronics/electro-mechanical
devices (copiers, complex
manufacturing equipment, etc.)
A Brief History (Contd.)
Aviation (Boeing-777, Air Bus, etc.)
Prognostics at the component/
subsystem level start to appear for
the first time.
Still no system-wide prognostics! By
and large, prognostics are still done by
the human operators deciding how
much further they can go before
stopping.
Literature Survey …
Diagnostics are well
developed.
Prognostics are not!
 Logical next step …
Intelligent System Level
Prognostics
Approaches to Diagnostics
and Prognostics
 Data Driven Methods
 Analytical Methods
 Knowledge based Methods
Data Signatures
Library of predictive algorithms based
on a number of advanced pattern
recognition techniques - such as
multivariate statistics, neural networks,
signal analysis
Identify the partitions that separate the
early signatures of functioning systems
from those signatures of malfunctioning
systems
Predictive indicators of
failures
 A viable prognostic system should be
able to provide an accurate picture of
faults, component degradation, and
predictive indicators of failures
Allowing our operators to take
preventive maintenance actions to
avoid costly damage on critical parts
and to maintain availability/readiness
rates for the system.
Data Driven Methods
The huge amount of data has to be
reduced intelligently for any careful
fault diagnosis.
Reduce the superficial dimensionality
of data to intrinsic dimensionality (i.e.,
number of independent variables with
significant contributions to nonrandom
variations in the observations).
Data Driven Methods
Feature extraction:
Partial Least Square (PLS)
Fisher Discriminant Analysis
Canonical Variate Analysis
Principal Component Analysis
We will only focus on PCA and its
non-linear relative (NLPCA).
Principal Component Analysis
What is PCA?
It is a way of identifying patterns in data,
and expressing the data in such a way as to
highlight their similarities and differences.
Since patterns in data can be hard to find in
data of high dimension, where the luxury of
graphical representation is not available.
Principal Component Analysis
PCA is a powerful tool for
analyzing data.
The other main advantage of
PCA is that once you have found
these patterns in the data, and
you compress the data, i.e. by
reducing the number of
dimensions, you have not much
loss of information.
PCA …
The feature variables
in PCA (also referred
to as factors) are
linear combinations of
the original problem
variables.
Classical Statistics based
steps …
1.
2.
3.
4.
PCA
Get Data
Subtract the mean
Calculate the covariance matrix
Calculate eigenvalues and
eigenvectors of covariance matrix
5. Choose feature vector (data
compression begins from here)
6. Derive the new data set (reduced)
Principal Component
Analysis (PCA)
Assuming a data set of Y
containing n observations and m
variables (i.e., a n x m matrix), PCA
divides Y into two matrices T or the
scores dimension (n x f) and P which
is the loading matrix dimension (m x
f) plus a matrix of residuals of E
dimension (n x m).
Principal Component
Analysis (PCA)
It is known that PCA optimizes the
process by minimizing the Euclidean norm
of the residual matrix E .
To satisfy this condition, it is known that
columns of P are the eigenvectors
corresponding to the f largest
eigenvalues of the covariance matrix of E
.
Principal Component
Analysis (PCA)
In other words, PCA transforms our data
from m to f dimension by providing a
linear mapping:
T YP
where Y represents a row of the original
data set Y and T represents the
corresponding row of T .
Non-Linear PCA (NLPCA)
In
Kramer’s
NLPCA,
the
linear
transformation in PCA is generalized to
any nonlinear function such that
T  G(Y )
where G is a nonlinear vector function
composed of f individual nonlinear
functions analogous to the columns of P .
Non-Linear PCA (NLPCA)
Analytical Methods
The analytical methods generate features
using detailed mathematical models.
Based on the measured input u and
output y , it is common to generate

residuals r, parameter
estimates P , and

state estimates x .
The residuals are the outcomes of
consistency checks between the plant
observations and a mathematical model.
Integrated Method for Fault
Diagnostics and Prognostics (IFDP)
Based on
NLPCA for dimensionality reduction
Society of experts (E-AANN, KSOM, RBFC)
Gated Experts
All developed in Matlab with Simulink for
model simulations
Extended Auto-Associative Neural
Networks
(E-AANN)
Kohonen Self-Organizing
Maps (KSOM)
KSOM defines a mapping from the input
data space n onto a regular twodimensional array of nodes.
In the System, a KSOM input is a vector
combining both inputs and outputs of a
certain the System component.
Every node i is defined by a prototype
vector mi  n. Input vector x  n is
compared with every mi and the best
match mb is selected.
Kohonen Self-Organizing Maps (KSOM)

Three-dimensional input data in which each sample vector x consists
of the RGB (red-green-blue) values of a color vector.
Radial Basis Function
based Clustering (RBFC)
The RBF rulebase is identified by our
clustering algorithm.
We will consider a specific case of a
rulebase with n inputs and a single
output. The inputs to the rulebase are
assumed to be normalized to fall within
the range [0,1].
Gated Experts for Combining
Predictions of Different Methods
The Gated Experts (GE) architecture [Weigened
et al, 1995] was developed as a method for
adaptively combining predictions of multiple
experts operating in an environment with
changing hidden regimes.
The predictions are combined using a gate block,
which dynamically assigns probabilities to the
forecast of each expert being correct based on
how close the current regime in the data fits the
area of expertise for that expert.
Gated Experts for Combining
Predictions of Different Methods
The training process for the GE architecture uses
the expectation-maximization (EM) algorithm,
which combines both supervised and unsupervised
learning.
The supervised component in experts learns to
predict the conditional mean for the next observed
value, and the unsupervised component in the
gate learns to discover hidden regimes and assign
the probabilities to experts’ forecasts accordingly.
Gated Experts for Combining
Predictions of Different Methods
The unsupervised component is also
present in experts in the form of a
variance parameter, which each expert
adjusts to match the variance of the data
for which it was found most responsible
by the gate.
Prototype Hardware
Implementations
A Chiller at Texas A&M University with
(Langari and his team)
 A laser pointing system prototype at the
University of New Mexico (Jamshidi and ACE
team)
 A COIL laser at AFRL - USAF (Jamshidi &
Stone)
 A flash memory line at Intel Corp. (Jamshidi &
Stone)
Chiller Model at Texas
A&M University
Input
1

Input
System
Boundary
Vs
Input
3
2
Input
Training Data and Test Data
Whole data with 1000 samples
Training Data and Test Data
Normalized training data with 2% noise (sorted)
Training Data and Test Data
Normalized test data with 2% noise (sorted)
One Sensor with Drift Error
Test data with 2% noise, sensor 3 has drift error
One Sensor with Drift Error
Drift error and sensor 3 data
One Sensor with Shift Error
Test data with 2% noise, sensor 3 has shift error
One Sensor with Shift Error
E-AANN output, the input data had 2% noise and shift error
One Sensor with Shift Error
Shift error and sensor 3 data
One Sensor with Shift Error
The difference between E-AANN input and output, the input
data had 2% noise and shift error
PCA Application to Cardiac
Output
Cardiac output is defined by two factors.
Stroke volume
Heart Rate
Cardiac Output = Heart rate X Stroke volume
(ml/min)
(beats/min)
(ml/beat)
CO for basal metabolic rate is about 5.5L/min
The human heart
Prognostics of CO using PCA
Analysis
PCA is used in identifying patterns in data,
and expressing the data in such a way to
highlight their similarities and differences.
PCA assists us in making an accurate
prognostic analysis of a patients Cardiac
output performance and hence predict
possible heart failures.
Good data representation
By taking several measurements of CO,
one is able to predict the possibilities of
heart failure, and this allows for PCA to be
very useful in the prognostics of Cardiac
output.
PCA takes these millions of output
measurements and crunches them into a
graph representation, from which we can
easily visualize CO defects.
Why prognostics ?
In medicine, the cheapest way to cure
disease is to prevent it. This is done with
early diagnostics, medicines, vaccines,
etc..
However with an accurate prognostics
approach, conditions like heart attack and
heart failure can be greatly minimized.
PCA enables us to arrive at prognostics.
Parkinson's Disease
Tremors
a) No medication nor
brain Stimulation
b) Brain Stimulation &
no medication
c) No brain stimulation
and medication
d) Bran stimulation and
medication
Test 1: Tests made on the differences and similarities in
patients that have both medication and brain stimulation on
vs. medication off and brain stimulation on.
Test 2: Tests made on the differences and similarities in
patients that have both medication and brain stimulation on
vs. medication on and brain stimulation off.
Test 3: Tests made on the differences and similarities in
patients that have both medication and brain stimulation on
vs. medication off and brain stimulation off.
Test 4: Tests made on the differences and similarities in
patients that have medication on and brain stimulation off
vs. medication off and brain stimulation on.
PCA Image Processing ORIGINAL & REDUCED 10
EIGENVECTORS
PCA
50
50
100
100
150
150
200
200
250
250
300
300
50
100
150
200
250
300
350
400
50
100
150
200
250
300
350
400
PCA ORIGINAL & REDUCED 20
EIGENVECTORS
PCA
50
50
100
100
150
150
200
200
250
250
300
300
50
100
150
200
250
300
350
400
50
100
150
200
250
300
350
400
PCA ORIGINAL & REDUCED
30 EIGENVECTORS
PCA
50
50
100
100
150
150
200
200
250
250
300
300
50
100
150
200
250
300
350
400
50
100
150
200
250
300
350
400
PCA ORIGINAL & REDUCED
40 EIGENVECTORS
PCA
50
50
100
100
150
150
200
200
250
250
300
300
50
100
150
200
250
300
350
400
50
100
150
200
250
300
350
400
PCA ORIGINAL & REDUCED
54 EIGENVECTORS
PCA
50
50
100
100
150
150
200
200
250
250
300
300
50
100
150
200
250
300
350
400
50
100
150
200
250
300
350
400
USING ALL 325 EIGENVECTORS
PCA
50
100
150
200
250
300
With all 325 eigenvectors we
can see that this image looks
the same as our image with
only 54 eigenvectors.
50
100
150
200
250
300
350
400
PCA PERCENTAGES
Eigenvectors
% Of Eigenvectors Used
10
5.20%
20
10.42%
30
15.63%
40
54
20.83%
28.10%
325
100%
Laser Pointing System at
UNM
Lab View
Controller Algorithm
ADC
DAC
DAC
X/Y motors
Mirror
Filter
Detector Quadrant
L
A
S
E
R
Prognostics – Possible test beds
Chemical
Laser
System
ATL –
Advanced
Tactical
Laser
Prognostics – Possible test beds
Large Gimbal system
hardware system NOP (North Oscura
Peak) System
HARDWARE Prognostic System

Inputs
Original
Data
NOP
Subsystem
Knowledge Base
(NOP Senior
Engineers)
Relevant
Data
Data
Reduction
Expert System
Inputs
PCA
Reduced
Dominant
Data
NOP Diagnostic –
Prognostic System
RBFC
KSOM
E-AANN
Outputs
GE-NN
Architecture
FAB
Template
Data
iUSC
SECS Link
Data/Templates
Data Repository
(
SECS/GEM
Process Tool
Office NT Workstation
(Domain's PDE)
Unix)
The Intel Flash Memory
Assembly Line
The Intel flash memory
assembly line is a state of the
art system that uses many
sensors to monitor operating
conditions.
PCA
 Hundreds of sensors produce thousands of
signal inputs per minute on the assembly
line. Most of the incoming data is
irrelevant. Principal component analysis
finds the relevant information among the
explosion of data and provides it to a
computer for analysis.
Feature Extraction
PCA is used to
reduce the
dimensionality of
the sensor data and
extract ‘features’ (or
characteristic
attributes). The
features are fed to
the computer for
analysis.
Alternate Method
Alternately, data can be fuzzified and
similarities can be found through this
process. A neural network is then
trained from the different data sets to
determine a good data “signature” for
which to judge all incoming streams of
data.
Decision Making
Distilled signal information is
handed to a computer for
analysis. The computer can
quickly recognize changing
trends leading to a failure and
alert an operator before the
failure actually occurs.
Conclusions
Due to the huge number of sensors on many
Systems, our approach for fault diagnostics and
prognostics must be capable of intelligent
data reduction (PCA) in such a way that no
important data is lost and all the crucial data be
used for smart prognosis with minimum false
alarms.
In its final configuration, it is expected that a
library of these strong methods which is under
development at benefit the the System program,
ATL, Intel System, Bio-medical cases, etc.
THANK YOU!