Bayes Scorpion Example - Statistical Design Methods

Download Report

Transcript Bayes Scorpion Example - Statistical Design Methods

Bayesian Statistics :Applied to
Reliability: Part 1 Rev. 1
1
What is Bayesian Statistics?
• It is the application of a particular probability
rule, or theorem, for understanding the
variability of random variables i.e. statistics.
• The theorem of interest is called Bayes’
Theorem and will be discussed in detail in this
presentation.
• Bayes’ Theorem has applicability to all
statistical analyses not just reliability.
2
Blame It All on This Man
Rev. Sir Thomas Bayes (born London
1701, died 1761) had his works that
include the Theorem named after him
read into the British Royal Society
proceedings (posthumously) by a
colleague in 1763.
References.
The two major texts in this area are
“Bayesian Reliability Analysis,” by Martz
& Waller [2] and more recently
“Bayesian Reliability,” by Hamada,
Wilson, Reese and Martz [1]. It is worth
noting that much of this early work was
done at Los Alamos National Lab on
missile reliability and all the above
authors work or have worked at the
LANL [1].
There are also chapters in traditional
reliability texts e.g. “Statistical Methods
for Reliability Data,” Chapter 14, by
Meeker and Escobar
3
Bayesian Helps
Why Bayesian
• Bayesian methods make use of well known, statistically
accurate, and logically sensible techniques to combine different
types of data, test modes, and flight phases.
• Bayesian results include all possible usable information base
on data and expert opinion.
• Results apply to any missile selected and not just for “average
sets of missiles”.
• Bayesian methods are widely accepted and have a long track
record:
 FAA/USAF in estimating probability of success of launch vehicles
 Delphi Automotive for new fuel injection systems
 Science-based Stockpile Stewardship program at LANL for nuclear
warheads
 Army for estimating reliability of new anti-aircraft systems
 FDA for approval of new medical devices and pharmaceuticals
Bayesian methods have only recently caught on due to the increased computational
speed of modern computers. It was seldom taught to engineers because there were
few ways to complete calculations for any realistic systems.
4
Background: Probability
• To perform statistics one must understand some
basic probability
– Multiplication rule: Two events A,B the probability of
both events occurring is given by
• P(A and B)=P(A|B)P(B) = P(B and A)=P(B|A)P(A)=P(AB)
– There is no implication of time series of these events
– P{A|B) is called a conditional probability
• P(A and B)= P(A)P(B) if A and B are independent,
• P(A and B)=0 if A and B are mutually exclusive
• Example: Have 3 subsystems in series the probability the
systems does not fail by time T is RSYS=R1(T)*R2(T)*R3(T) if
failures are not correlated
These rules also work for probability distributions
5
A
AB
B
Background
• To perform statistics one must understand some
basic probability
– Addition rule: Two events A,B the probability of either
event A or B or both occurring is given by
• P(A or B)=P(A)+P(B) –P(A and B) = P(AB)
– There is no implication of time series of these events
• P(A or B)=P(A)+P(B) –P(A)P(B) if A and B are independent,
• P(A or B)=P(A)+P(B) if A and B are mutually exclusive
• Example: Have 2 subsystems in parallel the probability the
systems does not fail by time T is RSYS=R1(T)+R2(T)-R1(T)R2(T)
if failures are not correlated
These rules also work for probability distributions
6
Background
• To perform statistics one must understand
some basic probability
– Bayes’ Rule: Two events A,B the probability of
event A occurring given event B occurs is given by
• P(A | B)=P(B|A) P(A) / P(B)
– There is no implication of time series of these events.
• P(A|B)= posterior probability.
• P(A) = prior probability.
• P(B)=P(B|A)P(A)+P(B|~A) P(~A), called the marginal
probability of event B also called the “Rule of Total
Probability.”
These rules also work for probability distributions
7
Problem #1
• A=event that missile thruster #1 will not work=0.15
• B=event that missile thruster #2 will not work= 0.15
• P(A or B or both A and B) = P(A)+P(B)-P(A)P(B)
= 0.15+0.15-(0.15)(0.15)
= 0.30-0.0225 = 0.2775
• P(A or B but not both A and B)= P(A)+P(B)-2P(A)P(B)
= 0.15 + 0.15 – 0.045 = 0.255
8
Problem #2
• Let A = man age > 50 has prostate cancer,
B = PSA score >5.0, Say on the demographical region of interest P(A) = 0.3
that is the male population over 50 has a 30% chance of having prostate
cancer.
• Now I take a blood test called a PSA test and that test supposedly helps
make a decision about the presence of cancer in the prostate. A score of
5.5 is considered in the medium to high zone as an indicator (actually
doctors look at rate of increase of PSA over time).
• The PSA test has the following probabilities associated with the test result
P(B|A)=0.9, P(B|~A) = 0.2 i.e. even if you do not have cancer the test
registers PSA>5.0 about 20% of the time (false positive), Thus the
probability of getting a PSA score > 5.0 is given by
P(B)=P(B|A)P(A)+P(B|~A)P(~A) = (0.9)(0.3)+(0.2)(0.7)=0.41
• Using Bayes’ Theorem P(A|B)= P(B|A)P(A)/P(B)=(0.9)(0.3)/0.41=0.66
• Your probability of having prostate cancer has gone from 30% with no
knowledge of test to 66% given your knowledge that your PSA > 5.0
9
Bayesian “in a nutshell”
•
•
•
Bayesian models have two parts: the likelihood function and the prior distribution.
We construct the likelihood function from the sampling distribution of the data, which describes
the probability of observing the data before the experiment is performed, e.g. binomial distribution
P(s|n,R). This sampling distribution is called the observational model. After we perform the
experiment and observe the data, we can consider the sampling distribution as a function of the
unknown parameter(s), e.g. (R|n,s) ~ Rs(1-R)n-s. This function is called the likelihood function. The
parameters in the observational model (R) are themselves modeled by what is called a structural
model e.g..
The prior distribution describes the uncertainty about the parameters of the likelihood function
(R| Nm, p) ~ RNm p(1-R)Nm (1-p). The parameters Nm and p are called hyperparameters (the parameter
in the original problem is R, the reliability itself)
•
HIERARCHICAL MODELS: Sometimes the parameters, e.g. Nm and p, are not known and have their
own distributions known as hyperhyperdistributions or hyperpriors which have their own
parameters e.g. Nm~ GAM( a, k). This is more complex but can lead to better answers.
•
We update the prior distribution to the posterior distribution after observing data. We use Bayes'
Theorem to perform the update, which shows that the posterior distribution is computed (up to a
proportionality constant) by multiplying the likelihood function by the prior distribution.
(Posterior of parameters) ≈ (Likelihood function) X (prior of parameters)
•
Now we need to study each part of this equation starting with the likelihood
10
Binomial Distribution for the Data
• Probability of an event occurring = p
• Probability of event not occurring = 1-p
• Probability that event occurs x=2 times in n=5 tests
– 1st instantiation: pp(1-p)(1-p)(1-p)= p2(1-p)3
(multiplication rule)
– 2nd instantiation: p(1-p)p(1-p)(1-p)= p2(1-p)3
–…
– 10th instantiation: (1-p)(1-p)(1-p)pp = p2(1-p)3
• In general the number of ways of having 2 successes in
5 trials is given by 5!/(3!2!) = 10 = number of
combinations of selecting two items out of ten.
n
• nCr =   = n!/((n-r)!r!)
r
11
Binomial distribution
• Since any of the 10 instantiations would give
the same outcome (i.e. 2 successes in 5 trials
each with probability p2(1-p)(5-2)) we can
determine the total probability using the
addition rule for mutually exclusive events.
• P(1st Instantiation) + P(2nd instantiation) + P(…)
+P(10th instantiation) =5C2 p2(1-p)(5-2)
• P(x|n, p)= nCx px(1-p)(n-x)
– The random variable is x, the number of successful events
12
binomial distribution
random variable is x, the number of successes
P(x|n=10,p=0.2),
binomial distribution
Probability of exactly X successes
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
6
7
8
9
10
X, # successes in n tests
13
Likelihood Function
• If we have pass/fail data the binomial distribution is
called the sampling distribution however, after the
experiments or trials are performed and the outcomes
are known we can look upon this sampling distribution
as a continuous function of the variable p. There can
be many values of p that can lead to the same set of
outcomes (e.g. x events in n trials). The core of the
binomial distribution -- that part which includes p -- is
called a “likelihood function.”
• L(p|x, n)= likelihood that the probability of a successful
event is p given the data x and n.
• L(p|x, n) ≡ px (1-p)(n-x)
– The random variable is now p with x and n known
The Likelihood function is put to great use in Bayesian statistics
14
Likelihood Function, L(p|s,n)
The random variable is p, the probability of a success
L(p|s=7,n=10)
3.5
Likelihood of p given s and n
3
2.5
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
P, probability of successful test
Sometimes we label p as R, the reliability
15
Likelihood Functions*: What they look like!
L(p|s=0,n=1)
2.5
1 Test and 0 Successes
1.5
1
0.5
0
0
0.2
0.4
0.6
P, probabilitry of success
0.8
1
L(p|s=1,n=1)
2.5
2
1 Test and 1 Success
Likelihood
Likelihood
2
1.5
1
0.5
*Likelihood function normalized to
area under function = 1. This is not
necessary but allows for a better
interpretation
0
0
0.2
0.4
0.6
P, probability of success
0.8
1
16
Likelihood Functions for n=2 tests
L(p|s=2,n=2)
0 successes
3.5
3
3
2.5
2.5
2
1.5
L(p|s=0,n=1)
1
0.5
Likelihood
3.5
2 successes
2
1.5
1
0.5
0
0
0
0.2
0.4
0.6
0.8
P, probabilitry of success
1
0
L(p|s=1,n=2)
0.2
0.4
0.6
P, probability of success
0.8
1
1 success
1.6
1.4
1.2
Likelihood
Likelihood
L(p|s=0,n=2)
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
P, probability of success
0.8
1
17
Bayesian Approach
• Use Bayes’ Theorem with probability density functions e.g.
– P(A|B) => fposterior(parameter|data)
P(B|A) => Likelihood(data| parameter)
•
P(A) => fprior(parameter before take data)
– fposterior  Likelihood X fprior
Steps:
–
–
–
•
1. determine a prior distribution for parameters of interest based on all relevant information before
(prior to) taking data from tests.
2. perform tests and insert results into appropriate Likelihood function.
3. “multiply” prior distribution times Likelihood function to find posterior distribution.
Remember we are determining the distribution function of one or more
population parameters of interest.
–
–
In pass/fail tests the parameter of interest is R, the reliability itself.
In time dependent (Weibull) reliability analysis it is a and b, the scale and shape parameters ,for
which we need to generate posterior distributions.
18
Summary on Probability
• Learned 3 rules of probability.
Multiplication, addition and Bayes.
• Derived the binomial distribution for
probability of x events in n
(independent) trials.
• Defined a Likelihood function for
pass/fail data by changing the random
variable in the binomial distribution to
the probability p with known data (s,n).
• Illustrated Bayes’ Theorem with
probability density functions.
19
Reliability
• General Bayesian Approach
– Reliability is NOT a constant parameter. It has
uncertainty that must be taken into account. The
language of variability is statistics.
• Types of reliability tests
– Pass/Fail tests
– Time dependent tests (exponential & Weibull)
– Counting experiments (Poisson statistics)
20
Bayes’ Theorem for Distributions
when using pass/fail data
• fposterior(R|n, s, info)=L(s, n|R) X fprior(R| info)
• Example: For pass/fail data
– fprior(R)  Ra-1(1-R)b-1 (beta)
– L(s,n|R)  Rs(1-R)n-s (binomial)
– fposterior(R| s, n, a, b)  Rs+a-1(1-R)n-s+b-1
(beta-binomial)
Much more will be said of this formulation later
21
Prior, Likelihood, Posterior
What they look like!
Posterior Distribution combining
Prior and Data
Probability Density Function
Posterior distribution of
System Reliabilityafter 4
successes in 5 tests
Likelihood of System Reliability
using 4 successes in 5 tests only
Initial distribution of
System Reliability
assuming the most
probable value is 0.90
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
System Reliability
•
•
•
Blue curve is our prior distribution
Red curve is distribution assuming only information is 4 successes in 5 tests
Green curve is the Bayesian estimate adding the 4 out of 5 to the prior
– Estimate is between evidence (data) and prior
– Distribution is tighter than prior or data only  narrower confidence bounds
You want the information that comes from knowing the green curve!
9
22
Bayesian Calculator
Input Information
3/23/2011
Prior
0.900
Mode =
0.900
3
Median =
0.774
0.644958
5
Mean =
0.740
3
0.585027
Likelihood
# tests = n =
# successes = s =
successs/tests =
2.5
2
Mode =
1.000
2
Median =
0.794
0.685502
Likelihood
Mean =
0.750
0.60206
Posterior
1.000
29
Prior
1.5
1
Posterior
Mode =
0.940
Median =
0.845
Mean =
0.814
Prediction (probability of J successes in next m tests)
# future tests, m =
1
3.5
f(R ), pdf
p=
Nm =
Neff =
Bayesian Reliability
A.T. Mense R(NN) 4
6
m<=10
1.221849
0.5
0.808887
0
0.731155
0
0.1
0.2
0.3
0.4
0.5
0.6
R, Reliability
0.7
0.8
0.9
1
In the following charts I want to take you through a series of “What if”
cases using priors of different strengths and show how the data
overpowers the prior and leads to the classical result when the data is
sufficiently plentiful.
You will see that, in general, the Bayesian analysis keeps you from being
over optimistic or over pessimistic when the amount of data is small.
23
Bayesian gives lower probabilities for high
success and higher values for low success.
Comparison: Classical vs. Bayesian
0.45
0.4
0.35
0.3
0.25
P(s|6,<R>)
0.2
Bayesian keeps
one from being
too pessimistic or
too optimistic
Bayesian
0.15
0.1
0.05
0
0
1
2
3
4
5
6
24
Formulas Used In Beta-binomial model
for pass/fail data
• Prior Distribution:
R Nmp (1 - R) Nm (1-p )
f ( R | Nm, p ) =
B ( N mp  1, N m (1 - p )  1)
E[ R] =
N mp  1
,
Nm  2
Mode = p , N mis called the importance
• Likelihood:
• Posterior:
n-s
R (1 - R)
B( s  1, n - s  1)
s 1
s
E[ R ] =
, Mode =
n2
n
f ( R | n, s ) =
s
The prior uses what I call
the Los Alamos
parameterization of the
beta distribution. See
Appendix for more detail
R Nmp s (1 - R) Nm (1-p )n-s
f posterior ( R | n, s, Nm, p ) =
B( N mp  s  1, N m (1 - p )  n - s  1)
E[ R ] =
N mp  s  1
N p  s s,n l arg e s
, Mode = m

Nm  n  2
Nm  n
n
25
p=0.9, Nm=10
Take # tests n=4 and resulting successes s=3
BC1 Input Information
3/23/2011
Prior
0.900
Mode =
0.900
4.51
10
Median =
0.852
0.829846
4
12
Mean =
0.833
0.778151
3.5
Likelihood
# tests = n =
# successes = s =
successs/tests =
f(R ), pdf
p=
Nm =
Neff =
Bayesian Reliability
A.T. Mense R(NN) 5
3
4
Mode =
0.750
0.60206
2.5
3
Median =
0.686
0.503333
2
Likelihood
Mean =
0.667
0.477121
1.5
Posterior
Mode =
0.857
Median =
0.826
Mean =
0.813
Prediction (probability of J successes in next m tests)
0.845098
0.5
0.758651
0
0.726999
0.750
Posterior
# future tests, m =
Prior
1
10
m<=10
0
0.1
0.2
0.3
0.4
0.5
0.6
R, Reliability
0.7
0.8
0.9
1
Note how strong prior distribution (Nm =10 which is large) governs the posterior
distribution when there are a small number of data points. The posterior has a
maximum value at R=0.857 where a classical calculation would give
<R>=3/4=0.75.
26
p=0.9, Nm=2
Take # tests n=4 and resulting successes s=3
BC1 Input Information
3/23/2011
Prior
0.900
Mode =
0.900
1
2
Median =
0.736
2.5
0.577682
4
Mean =
0.700
0.522879
2
Likelihood
# tests = n =
# successes = s =
successs/tests =
f(R ), pdf
p=
Nm =
Neff =
Bayesian Reliability
A.T. Mense R(NN) 3
4
Mode =
0.750
0.60206
1.5
3
Median =
0.686
0.503333
Likelihood
Mean =
0.667
1
0.477121
Posterior
Mode =
0.800
Median =
0.744
Mean =
0.725
Prediction (probability of J successes in next m tests)
0.69897
0.592489
0
0.560667
0.750
Posterior
# future tests, m =
10
m<=10
Prior
0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
R, Reliability
0.7
0.8
0.9
1
Note how weak prior distribution (Nm=2) will still pull the posterior up that its
net effect is much smaller i.e. the posterior has a maximum value at R=0.800
where a classical calculation would give <R>=3/4=0.75. So the posterior and
likelihood are more closely aligned.
27
p=0.9, Nm=10
Increase tests: Take # tests n=20 and resulting successes s=15
BC1 Input Information
3/23/2011
Prior
0.900
Mode =
0.900
1
10
Median =
0.852
5
0.829846
12
Mean =
0.833
0.778151
4
Likelihood
# tests = n =
# successes = s =
successs/tests =
f(R ), pdf
p=
Nm =
Neff =
Bayesian Reliability
A.T. Mense R(NN)6
20
Mode =
0.750
0.60206
3
15
Median =
0.734
0.575544
Likelihood
0.727
2
0.564271
Posterior
Mode =
0.800
Median =
0.787
Mean =
0.781
Prediction (probability of J successes in next m tests)
0.69897
0.671945
0
0.660052
0.750
Mean =
Posterior
# future tests, m =
10
m<=10
Prior
1
0
0.1
0.2
0.3
0.4
0.5
0.6
R, Reliability
0.7
0.8
0.9
1
Note how strong prior distribution still pulls the posterior up but due to the
larger amount of data (n=20) the posterior distribution still shows a maximum
value at R=0.800 so the prior is beginning to be overpowered by the data.
28
p=0.9, Nm=2
Take # tests n=20 and resulting successes s=15
BC1 Input Information
3/23/2011
Prior
0.900
Mode =
0.900
4.51
2
Median =
0.736
0.577682
4
4
Mean =
0.700
0.522879
3.5
Likelihood
# tests = n =
# successes = s =
successs/tests =
f(R ), pdf
p=
Nm =
Neff =
Bayesian Reliability
A.T. Mense R(NN) 5
3
20
Mode =
0.750
0.60206
2.5
15
Median =
0.734
0.575544
2
Likelihood
Mean =
0.727
0.564271
1.5
Posterior
Mode =
0.764
Median =
0.748
Mean =
0.742
Prediction (probability of J successes in next m tests)
0.626419
0.5
0.599404
0
0.58782
0.750
Posterior
# future tests, m =
Prior
1
10
m<=10
0
0.1
0.2
0.3
0.4
0.5
0.6
R, Reliability
0.7
0.8
0.9
1
Note how weak prior distribution still has some effect but due to the larger
amount of data (n=20) the posterior distribution is almost coincident with the
likelihood (i.e. the data) and shows a maximum value at R=0.764 which is close
to the data modal value of 0.75. The data is overpowering the prior.
29
Nm=0 uniform prior
Take # tests n=20 and resulting successes s=15
BC1 Input Information
3/23/2011
Prior
0.900
Mode =
0.900
0
Median =
0.500
0.30103
2
Mean =
0.500
0.30103
2.5
20
Mode =
0.750
0.60206
15
Median =
0.734
2
0.575544
Likelihood
Mean =
0.727
1.5
0.564271
Posterior
0.750
Posterior
Prior
1
Mode =
0.750
Median =
0.734
Mean =
0.727
Prediction (probability of J successes in next m tests)
# future tests, m =
3.5
3
Likelihood
# tests = n =
# successes = s =
successs/tests =
1
4
f(R ), pdf
p=
Nm =
Neff =
Bayesian Reliability
A.T. Mense R(NN) 4.5
10
m<=10
0.60206
0.5
0.575544
0
0.564271
0
0.1
0.2
0.3
0.4
0.5
0.6
R, Reliability
0.7
0.8
0.9
1
Note how uniform prior distribution has no effect on the posterior which is now
exactly coincident with the likelihood function and is essentially the result you
obtain using classical analyses. Posterior mean R=0.750 . The data is all we have.
Note that p plays no role because all values of R are equally likely when Nm=0.
This is NEVER a useful prior to use because it brings no information to the
analysis and it is this additional information that drove us to use Bayesian
analyses in the first place.
30
Look at Predictions i.e. # successes in the next 10 tests
p=0.9, Nm=10, n=20, s=15, <R>=15/20=0.75
Comparison: Classical vs. Bayesian
3.00E-01
Probability
2.50E-01
2.00E-01
1.50E-01
1.00E-01
5.00E-02
0.00E+00
0
1
2
3
4
5
6
7
8
9
10
P(s|6,<R>)
9.54E-07
2.86E-05
3.86E-04
3.09E-03
1.62E-02
5.84E-02
1.46E-01
2.50E-01
2.82E-01
1.88E-01
5.63E-02
Bayesian
7.14E-06
1.12E-04
8.71E-04
4.48E-03
1.69E-02
4.89E-02
1.11E-01
1.97E-01
2.63E-01
2.41E-01
1.17E-01
p=0.9, Nm=10, n=4, s=3, <R>=3/4=0.75
Comparison: Classical vs. Bayesian
3.00E-01
Probability
2.50E-01
2.00E-01
1.50E-01
1.00E-01
The Bayesian
calculation
shows that
due to the
strong prior
(Nm=10) the
prediction of
success is
higher for
large numbers
of successes
as compared
to the use of
the average
reliability in
the binomial
calculation
5.00E-02
0.00E+00
0
1
2
3
4
5
6
7
8
9
10
P(s|6,<R>)
9.54E-07
2.86E-05
3.86E-04
3.09E-03
1.62E-02
5.84E-02
1.46E-01
2.50E-01
2.82E-01
1.88E-01
5.63E-02
Bayesian
2.02E-05
2.19E-04
1.25E-03
5.01E-03
1.56E-02
3.98E-02
8.52E-02
1.54E-01
2.31E-01
2.70E-01
1.98E-01
31
Given these Bayesian results how does
one choose a prior distribution?
•
•
This is the most often asked question both by new practitioners and by customers.
The answer is actually “it depends.”
– Depends on how you wish to represent previous knowledge
– Depends on your estimation technique for determining the hyperparameters (p,Nm) e.g. what
degree of confidence, Nm ,you have in the estimation of p.
– What are some possible methods for “guessing” a p value and setting an Nm value?
– Are there other “shapes” of priors that one could use? YES
•
What if we have uncertainty in Nm?
– Set up a hyperprior distribution for Nm – say a gamma distribution.
h (h N m )
f (Nm ) =
(d )
d -1
exp ( -h N m ) , N m  0
– The parameters of f(Nm) i.e. (h,d) are called hyperhyperparameters.
– How do you choose the parameters d and h for the gamma distribution? Pick mode for Nm =
(d-1)/h and stdev of Nm = d1/2 / h.
•
Example: for a weak prior we might choose the mode = 3 for Nm, and a stdev =2, which gives h=1, d=4
32
Now it gets complicated!
• Up until now we have been able to solve for fposterior(R)
analytically but once we have a reasonable prior for Nm
we no longer can do this. Now the joint prior for R and
Nm given p, h, and d is shown below
d -1
R N p  s (1 - R ) N (1-p ) n - s h (h N m )
f joint ( R, N m ) =
exp ( -h N m )
m
m
B
(d )
• There is no analytical solution for fposterior(R) since one
now must integrate over Nm.
• This is one of the reasons that Bayesian has not been
widely accepted in a community that is looking for
simple recipes to calculate reliability.
Will apply MCMC to find fposterior(R), learn this in part-2 of lectures
33
Summary
• We see how Bayesian analysis for
pass/fail data can be analyzed rather
easily for specific forms of prior
distributions --- called conjugate priors.
• We see how data overcomes the prior
distribution and the amount of data
depends on the “strength” of the prior.
• We have ended with the formulation of
a more complex Bayesian analysis
problem that will require some form of
numerical technique for its solution
Numerical Techniques will be the subject of Part – 2 of these lectures
34
Time Dependent Reliability Analysis
• Consider the case of a constant failure rate “time-tofirst-failure” distribution i.e. f(t)=lexp(-lt)
• In classical analysis we look up values for l for all the
parts then perform a weighted sum of the l values
over all the components in the system (series system)
to arrive at a total “constant” failure rate, lT, for the
system. We can then use lT to find the reliability at a
given time t using R(t)=exp(-lTt).
• What if there is uncertainty in the l values that go into
finding lT? How do we handle that variability?
Applying a prior distribution to the parameter l is the Bayesian process
35
Time Dependent Reliability Analysis
•
•
•
•
•
Another technique applies when we have time-to-first-failure data i.e. say we have n units under
test and the test is scheduled to last for time = tR. When conducting these tests say r units fail and
we do not replace the failed units. The failure times are designated by ti, i=1,2,…,r and (n-r) units do
not fail by time tR.
The likelihood function for this set of n tests is given by
L =[lexp(-lt1) lexp(-lt2)… lexp(-ltr)] X [exp(-ltr+1) exp(-ltr+2)… exp(-ltn)]
= lr exp(-l(t1+t2+…+tr + (n-r)tR))= lr exp(-l(TTT)), TTT=total time on test.
In classical analysis we differentiate the above equation w.r.t. the parameter l and set the
derivative = 0 and solve for the l value that maximizes L (or more easily maximize ln(L)). The value
so obtained,
lMLE = r/TTT,
is called the Maximum Likelihood Estimate for the population parameter l.
Note: For this distribution ONLY the estimator for the failure rate does not depent on the number of
units on test except through TTT.
Again this technique assumes there is a fixed but unknown value for l and it is estimated using the
MLE method.
But in real life there is uncertainty in l so we need to discuss some distribution of possible l values
i.e. find a prior and posterior distribution for l and let the data tell us about the variability. The
variability in l shows up as a variability in R since R=exp(-lt). So for any given time say t=t1 we will
have a distribution of R values and that will be the same distribution but with lower mean value for
later times. If l itself changed with time then we have added as yet another complication.
Applying a prior distribution to the parameter l is the Bayesian process
36
Exponential distribution
• Start with an assumed form for the prior
distribution for l.
– One possible choice that allows for l to vary quite
a bit is the gamma distribution. Gamma(a,b)
b(bl ) a -1 e - bl
g prior (l ) =
, a, b, l  0
 (a)
– With reasonable choices for a and b this
distribution allows for l to range over a wide
range of values.
• Likelihood is given by
n
L(l | data ) =  l e - lti (assumes no censored data)
i =1
37
Exponential time-to-failure Distribution
Model variation of l with Gamma distribution
• Multiplying Prior X Likelihood gives
 - l  ti 
- l ( b   ti )
b(bl )
- bl  n
n  a -1
i =1 
i =1
f posterior (l | data, a, b) 
e
l e
l
e


(a )


a -1
n
n
n
= Gamma (n  a, b   ti )
i =1
• So the prior is a Gamma ( a, b) distribution and the
Likelihood is the product of exponential reliabilities for n
tests run for long enough to get n failures and so the we
know all n failure times. The posterior turns out to also be a
Gamma(n+a, b+TTT), TTT=(t1+t2+…+tn)=total time on test.
• The problem that may occur here is in the evaluation of the
Gamma posterior distribution for large arguments. One
may need to use MatLab instead of excel.
38
R(t)
• To find the reliability as a function of time one
must integrate over l from 0 to ∞, i.e.
n


( n  a ) -1

- b   ti  l 
n
n


 i =1  
  b   t    b   t  l 
e
i 
i

l =


i
=
1
i
=
1



 

- lt
R(t ) =  e 
 dl
 (n  a)
l =0






• Again this can be integrated only for special
values of the parameters but evaluation for large
arguments of the Gamma distribution may
require top of the line software.
This will be evaluated later.
39
Weibull distribution
(See Hamada, et al. Chapter 4, section 4)
• Once we have dealt with the exponential distribution
then the next logical step is to look at the Weibull
distribution that has two parameters (a, b) instead of
the single parameter (l) for the exponential
distribution. Now let’s address a counting problem
which is very typical of logistics analysis. With two
parameters we will need a two variable or joint prior
distribution fprior(a, b) which in some cases can be
modeled by the product fa,prior(a)fb,prior(b) if the
parameters can be shown to be independent.
• Even if independence cannot be proven one almost
always uses the product to make the priors useable for
modeling purposes.
40
Summary of time dependent Bayesian
Reliability Modeling
• It has been shown how to set of a typical time-to-firstfailure model.
• The principals are the same as for any Bayesian
reliability model and there will be a distribution of
probability of failure vs time for each time.
• These distributions in parameters lead to distributions
in reliabilities and in principal one can graph say 90%
credible intervals for each time of interest and these
intervals should be smaller than classical intervals that
use approximate normal or even nonparametric
bounds.
41
Discussion of Hierarchal Models
• This follows closely the paper by Allyson Wilson [3]
• Hierarchical models are one of the central tools of Bayesian
analysis.
• Denote the sampling distribution as f(y|q) e.g. f(s|R) =
binomial distribution, and the prior distribution as g(q|a)
where a represents the parameters of the prior distribution
(often called hyperparameters) e.g. g(R|Nm,p) = beta
distribution.
• It may be the case that we know g(q|a) completely, including
a specific value for a. However, suppose that we do not know
a, and that we choose to quantify our uncertainty about a
using a distribution h(a) (often called the hyperprior), e.g.
h(Nm) since we know p but do not know Nm.
42
General Form for Bayesian Problem
• 1. The observational model for the data.
• (Yi |q) ̴ f(yi |q); i = 1, …, k:
• 2. The structural model for the parameters of
the likelihood.
• (q|a) ̴ g(q|a)
• 3. The hyperparameter model for the
parameters of the structural model.
• a ̴ h(a)
43
Observational Model
• In the pass/fail example used early in this presentation
the observational model was the binomial distribution
f(s| R) ~ Rs (1-R)n-s where the random variable is the
number of successes, s.
• In this particular model the parameter of interest, R,
also happens to be the result we are seeking. In other
problems e.g. time to failure problem using a Weibull
distribution as the observational model we have a and
b as parameters and we construct R(t;a,b) = exp((t/a)b), which is the result we are seeking.
44
Structural Model
• The structural model addresses the variability of
the parameter q to a set of hyperparameters a.
• In our pass/fail example we have a beta
distribution for our structural model or prior, e.g.
f(R|p,Nm) ~ RNmp(1-R)Nm(1-p)
• This parameterization of the beta distribution is not the
standard form which is typically written as
Ra-1(1-R)b-1 but I have chosen to use the Los Alamos
parameterization for its convenience of interpretation (a =
Nmp+1, b = Nm(1-p)+1), because p = mode of the distribution
and Nm is a weighting factor indicating the confidence we
have in the assumed p value.
45
Hyperparameter Model
• This is the distribution(s) used to model the
breadth of allowable values for the
hyperparameters themselves.
• In our pass/fail example the hyperparameters
were p and Nm. But I only used a hyperprior
for Nm just for illustration purposes, e.g.
gamma distribution Nm ~ GAM( h, d)
• Solution requires numerical techniques that
will be discussed in Part-2 of these lectures.
46
References
1. Bayesian Reliability, by Hamada, Wilson,
Reese and Martz, Wiley (2007)
2. Bayesian Reliability Analysis, by Martz &
Waller, Wiley (1995)
3. “Hierarchical MCMC for Bayesian System
Reliability,” Los Alamos Report eqr094, June
2006.
4. “Statistical Methods for Reliability Data” by
Meeker & Escobar, Wiley, (1998), Chapter 3
47
Appendices
48
Definitions etc.
49
Gamma and Beta Functions
• The complete beta function is defined by
1
complete beta function,B(a ,b ) =  pa -1 (1 - p) b -1 dp = (a )( b ) (a  b )
0
• The gamma function is defined by
(a ) =

x
e - x dx , a  0, complete Gamma function
a -1
0
 (a ; y ) =
y
x
e - x dx , a  0, incomplete Gamma function
a -1
0
For α = n, integer

(n) =  x n -1e - x dx = (n - 1)!, n = 1,2,
0

(a ) = 2  z 2a -1e - z dz
2
0
(a ) = (a - 1) (a - 1)
p  5 3 p
1
 3 1 1
  = p ,  =   =
,  =
2
4
2
2 2 2
2
 ( 0) = 
50
CLASSICAL RELIABILITY
51
Bayes’ Theorem Applied to Reliability
as measured by Pass/Fail tests.
• Assume we are to perform a battery of tests the result of each test
is either a pass (x=0) or a fail (x=1).
• Traditionally we would run say n tests and record how many passed,
s, and then calculate the proportion that passed s/n and label this
as the reliability. i.e. <R>=s/n. This is called a point estimate of the
reliability.
• Then we would use this value of R to calculate the probability of say
s* future successes in n* future tests, using the binomial probability
distribution p(s*|n*,<R>)=n*Cs* (<R>)s*(1-<R>)(n*-s*)
• This formulation assumes R is some fixed but unknown constant
that is estimated by the most recent pass/fail data e.g. <R>.
• <R> is bounded (Clopper-Pearson Bounds) see next set of charts.
52
Classical Reliability analysis
• Classical approach to Pass/Fail reliability analysis
• To find a confidence interval for R using any
estimator e.g. <R>, one needs to know the statistical
distribution for the estimator. In this case it is the
binomial distribution.
53
Classical Reliability
• For example we know that if the number of tests was fairly large that the
binomial distribution (the sample distribution for <R>) can be
approximated by a normal distribution whose mean = <R> and whose
standard deviation, called the standard error of the mean SEM= (<R>(1<R>)/n)1/2 and therefore the 1-sided confidence interval for R is written
out in terms of a probability statement as follows:
•


 R  (1-  R )
Pr  R  - Z1-a
 R  1 = 1 - a
n


As an example suppose there were n=6 tests (almost
too small a number
to use a normal approximation, and s=5 successes, <R>=5/6, take a=0.05
for a 95% lower reliability bound, Z1-a = 1.645, SEM = 0.152 so one has
Pr 0.833 - 1.645* 0.152  R  1 = 0.95
Pr 0.583  R  1 = 0.95
• This is a very wide interval as there were very few tests.
54
Classical Reliability
•
•
•
•
•
This is a fairly wide confidence interval which is to be expected with so few tests.
The interval can only be made narrower by performing more tests (increase n and
hopefully s) or reducing the confidence level from say 95% to say 80%. Running
the above calculation at a=0.2 gives Pr 0.705  R  1 = 0.80
This is the standard (frequentist) reliability approach. Usually folks leave out the
confidence interval because it looks so bad when the number of tests is low.
Actually n=6 tests does not qualify as a very large sample for a binomial
distribution. In fact of we perform an exact calculation using the cumulative
binomial distribution (using the RMS tool SSE.xls) one finds Pr 0.418  R  1 = 0.95
for a confidence level of 95%. At a confidence level of 80% Pr 0.578  R  1 = 0.80
•
•
These non parametric exact values give conservatively large (but more accurate)
answers when the number of tests is small.
The expression for the nonparametric confidence interval can be found using the
RMS tools that are used to compute sample size (e.g. SSE.xlsm whose snapshot is
shown below). Tools available for download from eRoom at
http://ds.rms.ray.com/ds/dsweb/View/Collection-102393 look in excel files for
SSE.xlsm.
55
“"if all you have is a
hammer, everything looks
like a nail.”
Abraham Maslow, 1966
RAYTHEON TOOLS
56
Sample Size Estimator, SSE.xlsm
Raytheon Sample Size Estimator
R(lower) = 0.4182
Trials
n
6.0
2-sided CI => <p>=x/n= 0.166666667
Maximum # Failures
x
1.0
p(lower, a/2) =
Prob of Test Failure
p (upper, a)
0.5818
0.0042
Confidence
Level (1-a)
0.9500
Cumulative Probability
Prob #Fail <=x
0.0500
0.05
p(upper, a/2) =
0.6412
p = 1 - baINV (n - x, x  1)
In SSE the gold bar can be shifted under any of the 4 boxes (Trials, Max #
Failures, Prob of Test Failure, Confidence Level) by double clicking mouse button
in cell below the number you are trying to find. The gold bar indicates what will
be calculated. You will need to allow macro operation in excel in order for this
calculator to work. Another choice would be the “Exact Confidence Interval for
Classical Reliability calculation from binary data” excel spreadsheet which is
available from the author by email request. The actual equations that are
calculated can be found in reference by Meeker & Escobar [4] or an alternative
form can be copied from the formulas below.
The two sided bounds on the reliability confidence interval for a confidence
level = 1-a are given by RLower(a,n,s) = BETAINV(1-a/2,s,n-s+1) and RUpper(a,n,s)=
BETAINV(a/2,s+1,n-s) where n = # tests, s=#successes in n tests, and
CL=confidence level=1-a. The function BETAINV is in excel. For a one sided
calculation which is applicable for n=s (x=0) calculations one can use a instead of
a/2 in RLower equation.
57
Exact 2-sided Confidence Interval
Exact Confidence Interval for Classical Reliability calculation from binary data.
Input values entered into light yellow cells
N, # trials
30
s,# survived r=N-s, # failed
25
5
check on values using beta distribution
7/2/2012
pf(upper)
0.3472
pf(lower)
0.0564
0.3472
0.0564
a
CL
0.050
0.95
pf =(N-s)/N = 0.167
<R>= 0.833
In terms of reliability R= 1- pf, the confidence interval is given below.
≤ R ≤
0.9436
}
=
0.950
1-sided Pexact{
2-sided
≤ R ≤
Step 1: Insert values for # trials,N, and # failures,r, in those N trials
Step 2. Insert the desired confidence level for interval under CL
2-sided
}
=
0.950
2-sided Pexact{
Pr{S  s | R = RLower } =
 1-
a
2
a
n
Comparisons
i
i=s
=  n Ci ( RLower ) (1 - RLower )
i
i =0
n -i
= Pr{S  s | R = RLower }
What is smallest reliability that
will produce s survivors or more
with a probability of a/2.
RLower = b aInv
(n - r , r  1)
/2
Pr{S  s | R = RUpper } =
RUpper = b
Inv
1-a / 2
normal approx. CI
0.7000
0.9667
Score conf. Interval
0.6644
0.9266
two sided exact interval
0.6528
1.0000
=  n Ci ( RLower ) (1 - RLower ) n -i
2
s -1
0.6528
a
2
(n - r  1, r )
=  n Ci ( RUpper ) (1 - RUpper ) n -i
s
i =0
i
What is the largest reliability that
will produce s survivors or fewer
with a probability of a/2.
Reference: "Statistical Methods
for Reliability Data" by Meeker
and Escobar, Wiley Pub., 1998
also "Reliability Engineering
Handbook" by B. Dodson
and D. Nolan, Quality Publishing
1999
A.T. Mense
The yellow inserts show the binomial calculation needed to evaluate the
confidence bounds but these binomial sums can be rewritten either in terms
of Inverse Beta distributions or Inverse F distributions. Stated as a probability
form one has Pr{baInv/2 ( s, n - s  1)  R  b1Inv
-a /2 ( s  1, n - s )} = 1 - a
58
An example of how Bayesian
techniques aided in the search
for a lost submarine
View Backup Charts on Search for Submarine
BAYESIAN SEARCH ALGORITHM
59
Backup Charts
Bayes Scorpion Example
From Cressie and Wikle (2011)
Statistics for Spatio-Temporal Data
60
61
62
fY |Z (Y | Z ) =
f Z |Y ( Z | Y ) fY (Y )
fZ (Z )
63
64
65
66
67
68
69
70