Continuous Reliability Improvement

Download Report

Transcript Continuous Reliability Improvement

The Reliability of Reliability Data
1
of Math Investigation & Thinking
Lots of Equipment Lots
Knowledge,
RELIABILITY
TECHNOLOGY
GREEN and BOURNE
p 273
Lots of
Math
\
John Wiley & Sons Ltd
1972
p 244
Good
Reliability
Data Analysis
=\
Why Learn Any Theory:
To know
?
- When it is/is not Valid
- Potential Benefits
- Limitations
Lots of
Math/Stats
& Statistical
Distributions
\
Lots of
• Reliability Culture
• Training
• Equipment Knowledge,
• Collaboration
• Investigation &
• Thinking
The Reliability of Reliability Data
1:42
2
The Reliability of Reliability Data - 1
– In the beginning - Statistics Theory - circa 1800
Theory of Gambling & Love of Math
– Early Days - Reliability Theory - circa 1970
– Probabilistic Risk Assessment – circa 1975
– Reliability Data - RCM - circa 1990
– Reliability Data – Now - As always
– The Future
The Reliability of Reliability Data
1:43
3
Statistics Theory - circa 1800
Random Process
F(t) = Unreliability
-λt
F(t) = 1 – e
Exponential
Time to Failure
Distribution
(Probability of Failure
at or before this Age.)
R(t) = 1 – F(t)
1.050
F(t)
40
Same Shape
30
Wherever it
Starts
50 Sorted
Computer Generated
0.5
20
10
00
Memory-less
Process
t = -ln(1 - F(t) ) λ = 1
Random - Fail Times λ = 1 .
0
0.5
11
MTBF
22
1.5
2.5
33
3.5
Generated in
Excel with
=-LN(RAND())
e-λt = 1 - F(t)
Carl Friedrich
Gauß
1777–1855
44
1
MTBF =
λ
hr, yr
(or other
time unit)
Time
Failure Rate
No Infant
Mortality
Just Boring Flat
Failure Pattern
No WearOut
λ = Constant
(No Change
over Time)
Conditional Failure Density
Given Item OK at time = t
R(t) - R(t+t)
Lim(t 0) of
R(t) . t

The Reliability of Reliability Data
4
Confidence Limits – circa 1875
50
Take 2 Failures
Get an Estimate:
40
^
λ = 2 / (t1 + t2)
These 2
30
2
ChiSquared Stats Fn
Confidence Limits of Population
These 2
20
10
^
λ x 2
l, df
df
These 2
0
0
0.5
1
1.5
2
2.5
3
Confident Interval  = 0.9
For 2 Failures
λ
=1
90% (=0.9)
Confidence
Interval
Sample Size
i.e. Number
of Fails
l = (1- )/2 = 0.5
3.5
4
x 2
u, df
df
u = 1 – (1-  )/2 = 0.95
λ
df= Degrees of Freedom = 2 x Number of Failures
80
2
4
8
70
60
Generated in
Excel with
50
40
=CHISQ.INV( Probability, df)/df
30
More Failures =
Better Estimates
20
10
0
0
The Reliability of Reliability Data
^
λ
90
64
5% Confident
Lower
λ
100
95% Confident
Upper
http://en.wikipedia.org/wiki/Exponential_distribution
Friedrich Robert
Helmert. - 1875
0.2 0.4 0.6 0.8
1
1.2 1.4 1.6 1.8
2
2.2 2.4 2.6 2.8
Estimated MTBF
3
3.2 3.4
5
The Reliability of Reliability Data - 2
– In the beginning - Statistics Theory - circa 1800
– Early Days - Reliability Theory - circa 1970
Lots of Math – but lets face it - in practice assume λ constant
– Probabilistic Risk Assessment – circa 1975
– Reliability Data - RCM - circa 1990
– Reliability Data – Now - As always
– The Future
The Reliability of Reliability Data
1:47
6
Once upon a time
Reliability Theory - circa 1970
Typical Failure-Rate Characteristics for Engineering Devices
p 535
Phase 1
Phase 2
Early
Failure
Useful
Life
Experience Suggests that
this Failure Rate Function
follows fairly standard
Pattern with respect to
Sorted
time for most
by Burn-In &
technological devices. Commissioning
Phase 1: - Represents a pattern of failure events which
typically arises from Initial Production, Test
or Assembly Faults
Phase 2: - Useful Life with failure rate sensibly
Constant or with relatively slow change
Phase 3
Wear
Out
Returned To
As Good As New
Before Here
Death-rate characteristics for males living in
England & Wales for years 1960 to 1962
Phase 3: - Represents the effects of Ageing
The use of data - (after spending 2 chapters on
distribution theory)
... the precise form of the relevant statistical distribution
… may be difficult to acquire.
Assume:- Early Failure Removed in Commissioning – Replaced Before Wear Out
The Reliability of Reliability Data
7
Reliability Data (Generic) – MTBF - circa 1970
Typical MTBF Ranges
p 538
Green & Bourne:
Reliability Technology
Circuit Breaker
0.3 < λ < 10
Failures / 106 hr
11 yr < MTBF < 380 yr
MTBF > 1,140 yr
Need a major technological
breakthrough to get this
Pump, Circulators
2.5 < λ < 1,000 Failures / 106 hr
0.022 < λ < 8.8
Failures / yr
6 wk < MTBF < 46 yr
Only a Rough Guide to state of the technology – “Reader referred to specialised Literature”
The Reliability of Reliability Data
8
Reliability Data (Generic) – MTBF - circa 1997
Gulf Professional Publishing;
3 edition (September 11, 1997)
Bloch & Geitner: Machinery Failure Analysis
&Troubleshooting Vol 2
p 641
The Reliability of Reliability Data
9
Generic Data - Present & Select Approach
A Wide variety of Sources are
References and Compared
Report Supervised by yours truly - 1992
Estimate Ranges on
Subject Equipment
Based on:
• the Description of Subject
Equipment Type and Application
• Compared to the Equipment
Type and Application relating to
the Generic Data
e.g. If Subject Item is to be Offshore & exposed but of high quality
emphasis would be given to
OREDA data
http://www.oreda.com/
MIL 217
A new revision of the OREDA
handbook out by early in 2015.
6,000.00 NOK = 999.00 AUD
For Electronic Equipment
Generic Data
The Reliability of Reliability Data
IEEE 500
For Electrical Equipment
Generic Data
Current Sources of λ for Synthesised
System Reliability Estimation
10
Recall:
F(t) = 1 – e-λt
1.0
Linear Approximation
= 1 – e-t/MTBF
~λ
t
Reasonable Approximation
Provided t < 0.1 MTBF
5.0% Error @ t < 0.1 MTBF
F(t)
i.e. Operate Here
0.5% Error @ t < 0.01 MTBF
0.5
Most Equipment
λ=1
0
λt Ti
1
MTBF
2
3
Time
4
MTBF > 10 yr
Test Interval <= 1 yr
Ti
Thus
F(t)
Average failure probability
is often taken as
This
Assumes:
Ti
2 x MTBF
Ti = Test Interval
- Perfect Testing
- The Probability of Failure
is strictly Time dependant
For example, it is possible that there is
also Demand Related Failure Process
i.e. item fails immediately after
a successful test due to the
Demand Stress.
The Reliability of Reliability Data
Occasionally this will be
accommodated by taking
FAve ~ λd + λt Ti / 2
It is generally hard to
estimate the value of λd
(Per Demand Failure Rate)
11
The Reliability of Reliability Data - 3
– In the beginning - Statistics Theory - circa 1800
– Early Days - Reliability Theory - circa 1970
– Probabilistic Risk Assessment – circa 1975
A Key Driver for Better Reliability Data & Context
– Reliability Data - RCM - circa 1990
– Reliability Data – Now - As always
– The Future
The Reliability of Reliability Data
1:51
12
Probabilistic Risk Assessment – circa 1975
WASH-1400,
'The Reactor Safety Study’
(Rasmussen Report)
Estimate Likelihood and
Estimate ???
Consequence of Each Scenarios
Estimate Probability of
System Failure
Under Conditions Expected at
these Points in Sequence
Use Fault Tree
Top Event
System Fails
Accident Scenarios
For
Each PIE
??
Each
(Postulated
Initiating Event)
Range of System Fail
Probability Estimates
Start Here
In the 1990s, all U.S. nuclear power plants
submitted PRAs to the NRC under the Individual
Plant Examination program
AND & OR
Logic
Monte-Carlo
Methods allow
Uncertainty Propagation
Failure Estimate
Probability Distribution
Basic Events
Fukushima-plant
Identified Unresolved Areas:
 External Events (tsunami, etc)
 Common Mode Failure
 Human Error
The Reliability of Reliability Data
Item Failures
Tree
Developed
Elsewhere
13
System Failure Probability Estimate - Dependent Failure
Let P{A} = Probability Item A failing
(if P{A} = P{B} and events
independent)
P{S}
P{S} = P{A&B} = P{A} * P{B|A} = P{A}2
ß=.1
A
1.E-03
1.E-02
1.E-01
P{A} less Likely
P{A}
1.E-01
P{S} less Likely
1.E-02
B
P{S} (revised) to More
Likely using ß factor
1.E-03
System
Fails
ß=.1
And
1.E-04
1.E-05
A
B
Fails
Fails
1.E-06
In practice failure of such
items are not 100%
independent
Unrealistically low
estimate of system failure
probability for a pair of
similar items in a system.
Assumptions About ß Most Critical
The Reliability of Reliability Data
14
The Focus on Humans & Human Factors - 1979
The Three Mile Island accident - a partial nuclear meltdown
(March 28, 1979)
the worst accident in U.S. commercial
nuclear power plant history.
… a stuck-open pilot-operated relief valve … allowed large amounts
of nuclear reactor coolant to escape.
The Mechanical Failures were Compounded by the initial
Failure of plant Operators to Recognize the situation as a
Loss-Of-Coolant Accident - Due to
Inadequate Training and Human Factors, such as:-
“human-computer interaction design oversights relating to ambiguous TMI-2 – A write-off.
control room indicators in the power plant's user interface”.
… a hidden indicator Poorly
light led to
an operator
Manually
Trained
& a Messy
Overriding the Automatic
Emergency
Cooling
System of
Confusing
Control
Panel
the reactor because the operator mistakenly believed ….
Reliability Engineers had a whole new area to specialise in:
Job Security
Technique for Human Error Rate Prediction (THERP)
& New toys to
play with e.g.
Cognitive Reliability and Error Analysis Method (CREAM).
August 2005, (US NRC) SPAR-H (Standardized Plant
Analysis Risk - Human Reliability Analysis)
Oct. 2009: TMI-1 license
extended to 2034.
Equipment Reliability - f (Operator & Maintainer Training & Ergonomics)
The Reliability of Reliability Data
15
Focus on - Safety Culture - 1986
The term ‘Safety Culture’ was first introduced in INSAG’s#
Summary Report on the Post-Accident Review Meeting on the
Good safety Attitudes in Staff AND Effective
Organisational Safety Management
Systems and Practices.
Just doing a
run-down test
#
INSAG International Nuclear
Safety Advisory Group - IAEA
Chernobyl
Accident (1986)
Did not know - that they did not know
Just in case we could not work it out ourselves we
have had a Write-off of 2 reactors to remind us:If people doing the job are:&/Or
Not well Trained or
Organisation has
have Poor Equipment
toxic Culture – i.e.
Maintenance is:
Its Function is:
Its Product is:
Its Role is:
A Valuable part of
our Business
Asset Management &
Reliability Champion
A Cost Area we
must Reduce
To Repair
Plant Capacity
Nothing
Provide Highly Reliable
Plant -Cost Effectively
Keep Plant
Operating
Our Reliability Data will Show – We Have Problems
The Reliability of Reliability Data
16
The Reliability of Reliability Data - 4
– In the beginning - Statistics Theory - circa 1800
– Early Days - Reliability Theory - circa 1970
– Probabilistic Risk Assessment – circa 1975
– Reliability Data - RCM - circa 1990
“The presence of a well-defined wear-out region is far from Universal”
– Reliability Data – Now - As always
– The Future
The Reliability of Reliability Data
1:55
17
Reliability-Centered Maintenance- circa 1990
F. Stanley Nowlan, and Howard F. Heap.
11% MIGHT
benefit from
a limit on
operating age
SG-3. Maintenance Program Development Document.
Air Transport Association, Washington, D.C.
Report AD-A066579“ Revision 2, 1993.
4%
2%
5%
“The presence of a well-defined wearout region
is *far from Universal.”
*Nicely Understated
Mostly associated with simple items …
in the case of aircraft, such items as tires,
reciprocating-engine cylinders, brake pads,
turbine-engine compressor blades, and all
parts of the airplane structure
89% CanNOT
Benefit from
a Limit on
Operating Age
In most Items – (More Complex)
there are Competing Failure Modes
e.g. You will die from Cardiac Arrest
Unless, a competing mode e.g.
Stroke or 1 of many Cancers get you:before your faulty heart valve fails.
Have a Nice Day - Pay $120 on your way out
& Rate of Condition Degradation (Health) depends
a lot on Installation, in-use Stress & TLC
www.dtic.mil/cgi-bin/GetTRDoc?Location=U2&doc=GetTRDoc.pdf&AD=ADA066579
The Reliability of Reliability Data
Stop smoking, eat less junk & exercise more
18
The Dawn of the Age of CM
RCM put an emphasis on:A
Functional
Failure
Strength
B
B
Stress
A
Operating Age (hundreds of hours)
Monitoring the Condition of
Individual Items
Rather than Statistics of
Similar Items
The Impact of Past Stress
on Current Strength
The Margin between Strength
(Resistance to Stress)
& Stress of Each Item Right Now
& thus its Likely Future Change – (Prognosis to Failure)
But this could only “take-off” in Industry because:- Powerful & Effective CM tools
were becoming Commercially available off the shelf at reasonable prices
Continuing to get better
& cheaper each year
The Reliability of Reliability Data
19
The Reliability of Reliability Data - 5
– In the beginning - Statistics Theory - circa 1800
– Early Days - Reliability Theory - circa 1970
– Probabilistic Risk Assessment – circa 1975
– Reliability Data - RCM - circa 1990
– Reliability Data – Now - As always
You are being paid to Think
– The Future
The Reliability of Reliability Data
2:00
20
What Does it All Mean
Easy
Computing the Mean (Average)
of n Values
You earn your $
(give value to your Company)
So for 10 Values (n=10)
i=n
 =
Xi

i=1
Add up the 10 values
& Divide by 10
n
Knowing if the Answer (the Mean) Means anything much
& how to Use to - Improve the Plant
10 Workers
Average
Height
A
Possibly Useful
Information
So What!
Too much
Average
Variation to
Height
be Useful?
B
Similarly for Times to Failure
Population Statistics (e.g the Mean)
need Context to be meaningful
The Reliability of Reliability Data
21
Standard Assumptions – Similar Items
Quoted Statistics
e.g. The Average Height of Australian Men
Have Stated or
Implied Assumptions
You would feel mislead if you learnt for - Sample Selection
We went to All the Professional & Armature
Basketball teams in the country
Population of similar Items
If there are distinct Sub-populations with quite different statistics
The result may be dominated by the Relative Fraction
of each of the Sub-populations in the Sample
Sub Populations
We took all the Water (but no Slurry) Pumps
Operating Conditions
I used just Slurry Pump data
OK If –
Starting Condition
Study population Cleary Stated &
results Meaningfully Applied
Starting from a similar condition?
Typically All items are
Assumed to start from an ..
“As Good As New” Condition
Is this Assumption Valid?
e.g. Do your Refurbished Gearboxes have
a similar MTBF to your New Gearboxes? If your data is well organised
it should be easy to check
The Reliability of Reliability Data
22
Time Lines
When we Have
- 8 Failures
B
Settling Pond
- 2 Years Observation Time
- 4 Positions
C
A
- 4 x 2 = 8yr Exposure Time
MTBF = 8/8 = 1 yr
D
What If Like This?
Simple
A
B
C
D
Or Like This?
A
B
C
D
Really
Simple
2 Years
The Reliability of Reliability Data
23
Similar Failures/Units
8
7
Reaction
Tank
8 Stirrers
Installed
1
3
4
Pos’n
16
yr (104 wk)
8
Same
2 Very Early Failures Definition of
Same Mode?
Failure
*MTBF = 2
2
6
5
1
2
3
4
5
6
7
8
Failures = 8
Two seem to be due to
Poor Installation
Review Practice
What is the population?
& How many Failures?
X
X
X
X
X
Similar Operating Environment
Time
- CRIME SCENE - CRIME SCENE -
Observation time = 2 yr
Exposure time = 16 yr (2x8)
Failures = 10
X
X
X
X
1 yr
X
2 yr
Suggested that - Positions 6, 7 Worse than Others Check options
1, 2, 3, 4, 5, 8
Exposure time = 12 yr (2x6)
Failures = 3
6, 7
Exposure time = 4 yr (2x2)
Failures = 5
16
yr (83.2 wk)
12
4
*MTBF = 4
*MTBF = 0.8
yr (208 wk)
yr (41.6 wk)
10
3
5
- CRIME SCENE - CRIME SCENE 4 yr MTBF not ideal but if Achieved may not be Highest Priority
*MTBF = 1.6
The Reliability of Reliability Data
24
Standard Assumptions – Stationary Data
I thought this would
be an office job
Stats Man
Non-Stationary Data
Failure Time
~~~~~~~~~~~
~~~~~~~~~~~~~~~~
In Mathematics and Statistics, a Stationary Process is a Stochastic Process
- whose Joint Probability Distribution does not change when Shifted in Time.
Consequently, parameters such as the mean and variance,
if they are present, also do NOT change over time
and do NOT follow any Trends.
http://en.wikipedia.org/wiki/Stationary_process
The Reliability of Reliability Data
25
Timeline Identifies Changed Effects
4.0
3.5
Each Line a ‘De-scale Outage’
Outage Hours
3.0
Height = Duration of Outage
2.5
Non-Stationary
Data
2.0
1.5
1.0
0.5
0.0
1/12/01
29/12/01
26/01/02
23/02/02
23/03/02
20/04/02
18/05/02
15/06/02
13/07/02
10/08/02
7/09/02
5/10/02
2/11/02
30/11/02
Date
Chemical Engineer’s Comment:
“A Reducing-agent was added to Feed - Giving
Lower levels of Hexavalent Chrome in Precipitate.”
i.e. Changed Chemicals Used - On this
Date and Less Scale Problem
Looking at this simple plot - A fair person would accept that
the change has Reduced Scaling and
(considering this issue only) the Facility Capacity has Improved.
Such simple analysis is much more meaningful than entering
the times to failure into a Weibull Analysis and obtaining
parameters to feed into a Monte Carlo Simulation Model
I Used Average Outage
Frequency & Duration
over the Sample Period
To predict Future
Performance
The Reliability of Reliability Data
Process Change Over
Sample Period
26
Good Failure Analysis Practice - Rotable Tracking
Most CMMS allow Tracking of Equipment History.
Most sites only focus on the:
Equipment
Location
& Not on the Identity of the
Equipment in the Location.
The identification is usually done through serial
The *OEM serial number
numbers stamped or solidly attached onto the item. is typically used or …
e.g. Bar Code,
DataDot ID tag
(Magnified)
It is good practice to follow the Item’s repair history also
In case a particular
item has
a Fundamental
Because
???
Defect/Problem & is being repaired too often
Item B Failed
& Replaced by
Item D
Set of 3
Items
Installed
B1
P1
X
D1
C1
P2
Item B just been repaired &
now Returned to service
after D Failed
B2
X
X
A1
Failure/Replacement
Times
The Reliability of Reliability Data
 Faulty Laminations – causing
heating and winding failure
So what do we
Watch Now?
A2
X
X
E1
P3
Location
e.g.  Distorted Casing causing poor
Internal Alignment
D2
X
X
Item D Failed Early in
Location P1 & P2
Now Reinstalled in P1
X
C2
D3
X
B3
*OEM: Original Equipment
Manufacturer
27
Abuse - A Crushing Issue
Here it Comes
Looks???
Too Big
Crusher
Mouth
Previously No
automatic Feed-Stop
if Crusher Jammed
Impacts:- Lost Availability whilst Blockage
Cleared
- Significant chance of Damage to
Crusher
Issues to Fix Next:
- Root Cause
(Such Large Rocks should
Not be in Crusher Feed)
The Reliability of Reliability Data
Was Too Big
This made problem
worse – Was Fixed
Crusher Says:“Please Don’t
Damage Me with
that Rock Breaker”
28
The Reliability of Reliability Data - 6
– In the beginning - Statistics Theory - circa 1800
– Early Days - Reliability Theory - circa 1970
– Probabilistic Risk Assessment – circa 1975
– Reliability Data - RCM - circa 1990
– Reliability Data – Now - As always
– The Future
Make It Last Longer
The Reliability of Reliability Data
2:15
29
The Holy Grail
To make Wise/Cost Effective Decisions
For each Significant/Critical Asset:
Based on ALL Significant Information
We would Record its Life story Womb to Tomb
In fact at any point in time we would know its Health
We would know Why (it was in Whatever Health
More than that
To ..
Condition it is in)
& Options we have for Managing its Condition
Get this
Higher
So we
Optimise our
Design,
Operating &
Maintenance
Practices
Condition
Condition
Get this
Longer
AllWith
with
More
More Certainty
Certainty
atLess
Less Cost
at
Cost
Initial Identification
of a Problem
Quite Reasonably
There is a Focus on:
PF Interval
Gets This
Longer
Time
In an Ongoing Asset Reliability Improvement Program
The Reliability of Reliability Data
30
Make It Last Longer!
– The Future of Civilisation on Planet Earth
To Help Make it Last Longer
I am Changing Career to Volunteer Climate Lobbyist
For more info contact me on [email protected]
or 0401 590 701
The Reliability of Reliability Data
2:20
31