Transcript Document
Introduction to Risk Assessment in Engineering:
With Application to Heat Shield Reliability Modeling
Presented by:
Austin Howard
University of Idaho
Mechanical Engineering Dept.
Idaho Space Grant Consortium
Austin Howard
2
Outline
Introduction
Failure Mode Effect Analysis
Fault Trees
Event Trees
Obtaining Component Reliability
Monte Carlo Method
Case Study: Heat Shield Reliability Modeling
Summary
Austin Howard
3
Purpose of This Talk
Describe importance of risk assessment
Introduction to key tools, processes, and concepts
related to risk analysis
Provide context with case study based on experiences at
summer internship at NASA Ames 2006
Note: Risk assessment is its own discipline and therefore
it is outside the scope of this talk to show you how to
create/evaluate risk models
Austin Howard
4
Definition: Risk
Risk:
“The combination of the frequency, or probability, of
occurrence and the consequence of a specified hazardous
event” -www.bees.unsw.edu.au/ohs/definitions.html
One of many ways to calculate risk:
Risk=(Probability of failure)x(Severity of the Consequence)
Austin Howard
5
Risk
Risk is also a board game:
Austin Howard
6
Risk vs. Unreliability
Risk is not the same as Unreliability
Reliability: Probability that a device will function without
failure over a specified period of time or amount of usage
Reliability is one of the (but not the only) factors that
contributes to system risk
Reliability analysis is often used interchangeably with risk,
but they are two different concepts
Engineers often present reliability statistics rather than risk
values due to difficulty of measuring and comparing
consequence severity
Austin Howard
7
Risk vs. Safety
Judging Risk
is a quantitative activity grounded by testing, and physical
modeling
Judging Safety
is a qualitative, political activity
You must have a safety standard to judge system risk
against otherwise risk is a relatively meaningless
value in decision making and design assessment
Austin Howard
8
Deterministic vs Non-Deterministic
Deterministic model-model behaves predictably
In other words, for a constant input, you will always get
the same output
Non-deterministic model-model with one or more
choice points where different continuations are
possible
In other words for a constant input, you will not always
get the same output
Requires input from one or more: user, global variables,
hardware timer, random numbers, stored data…
Austin Howard
9
Purpose of Risk Assessment
Purpose of Risk Assessment: Answering and effectively
communicating the following questions/considerations:
Haimes, Yacov Y. Risk Modeling, Assessment, and Management.
Hoboken, NJ, USA: John Wiley & Sons, Incorporated, 2005. p 23.Austin Howard
http://site.ebrary.com/lib/uidaho/Doc?id=10114200&ppg=47
10
Importance of Risk Analysis
Reputation
Customer Satisfaction/Safety
Warranty Costs
Repeat Business
Cost Analysis
Customer Requirements
Competitive Advantage
Austin Howard
11
Cont…
Reduce long term cost
Austin Howard
http://klabs.org/DEI/References/design_guidelines/analysis_series/1314msfc.pdf
12
Process
Fail
Risk
Mitigation
Pass
Risk Communication/
Safety Check
Production
System
Risk Assessment
Ex. FMEA
System Tree(s)
(Fault and/or Event)
Sub System
Sub System Tree
(Fault and/or Event)
Sub System Tree
(Fault and/or Event)
Component
Model and/or
Test
Model and/or
Test
Austin Howard
Model and/or
Test
Model and/or
Test
13
Outline
Introduction
Failure Mode Effect Analysis (DFMEA)
Fault Trees
Event Trees
Obtaining Component Reliability
Monte Carlo Method
Case Study: Heat Shield Reliability Modeling
Summary
Austin Howard
14
Failure Mode Effect Analysis
(FMEA)
Other wise known as:
Failure Mode Effect Criticality Analysis (FMECA)
Design Failure Mode Effect Analysis (DFMEA)
Process Failure Mode Effect Analysis (PFMEA)
Purpose
Define and guide a logical design process
Identify, quantify, and reduce design risk
Provide a traceable document for design and development
Justify design activities
Provide a means for continuous product improvement
Austin Howard
15
Cont…
Combines Possible Failure:
Severity (rate 1-10)
Occurrence (rate 1-10)
Detect-ability (rate 1-10)
Product of the parameters is called the RPN, this value
describes the overall risk of each failure mechanism
High RPN numbers = high risks
Focus on these failure mechanisms first in risk mitigation
process
Austin Howard
16
FMEA Process
Austin Howard
http://www.qualitytrainingportal.com/resources/fmea/fmea_process.htm
17
Example: FMEA
Austin Howard
18
Outline
Introduction
Failure Mode Effect Analysis (DFMEA)
Fault Trees
Event Trees
Obtaining Component Reliability
Monte Carlo Method
Case Study: Heat Shield Reliability Modeling
Summary
Austin Howard
19
Fault Trees
At the top of a fault tree is a failure
Under the tree are all the possible faults that could
lead to the top failure
Fault trees are used for viewing a system and the
interactions between faults and possible paths to a
failure
Fault trees can be built with software and combined
with probabilities to produce reliability estimates
Austin Howard
20
Cont…
Paths from bottom to top of tree are termed cutsets,
the shortest cutset is the minimum cutset
Symbols used:
Haimes, Yacov Y. Risk Modeling,
Assessment, and Management.
Hoboken, NJ, USA: John Wiley &
Sons, Incorporated, 2005. p 530.
http://site.ebrary.com/lib/uidaho/Doc
?id=10114200&ppg=554
Austin Howard
21
Example: Fault Tree
Austin Howard
http://safety.transportation.org/htmlguides/implement/ProcAppJ.htm
22
Outline
Introduction
Failure Mode Effect Analysis (DFMEA)
Fault Trees
Event Trees
Obtaining Component Reliability
Monte Carlo Method
Case Study: Heat Shield Reliability Modeling
Summary
Austin Howard
23
Event Trees
Goal of event tree
to determine the probability of an event based on the
outcomes of each event in the chronological sequence of
events leading up to it
By analyzing all possible outcomes using event tree
analysis, you can determine the percentage of
outcomes which lead to the desired result
Event trees can be built with software to produce
reliability estimates
Austin Howard
24
Example: Event Trees
Austin Howard
http://www.ece.cmu.edu/~koopman/des_s99/safety_critical/
25
Outline
Introduction
Failure Mode Effect Analysis (DFMEA)
Fault Trees
Event Trees
Obtaining Component Reliability
Monte Carlo Method
Case Study: Heat Shield Reliability Modeling
Summary
Austin Howard
26
Testing
Advantages
Can illuminate overlooked failure mechanisms
Some situations cannot be modeled accurately with
current physical understanding
Turbulence
Limitations
Expensive
Time consuming
Need lots of data to be meaningful
Austin Howard
27
How Modeling Produces
Unreliability
Design Probability
Curve
Load Probability
Curve
Area=Probability of
failure
Mean Load
Design Margin
Austin Howard
Mean Design
Spec
28
Modeling
Advantages
Can be relatively inexpensive/fast
Limitations
Easy to make incorrect assumptions/mistakes
Some situations are difficult/impossible to model
accurately
Austin Howard
29
System/Sub-System Reliability
Series Reliability
A
B
C
Rtot = RA * RB * RC
Full Redundancy
A
B
C
Rtot = 1- (1- RA ) * (1 - RB) * (1 - RC)
Austin Howard
30
Outline
Introduction
Failure Mode Effect Analysis (DFMEA)
Fault Trees
Event Trees
Obtaining Component Reliability
Monte Carlo Method
Case Study: Heat Shield Reliability Modeling
Summary
Austin Howard
31
The Essence of Monte Carlo
Monte Carlo: Method of modeling involving
inputs from random or pseudo random numbers
Output produced has the similar characteristics
to that of data collected from an experiment*
Similar scattering of data
The more “runs” of the model, the more pronounced
the trends are
*If input is correct - your model output is only as good
as the information you put into the model
Austin Howard
32
What Monte Carlo Looks Like
Austin Howard
Vose, David; Quantitative Risk Analysis:A guide to Monte Carlo simulation modeling; 1996
33
Outline
Introduction
Failure Mode Effect Analysis (DFMEA)
Fault Trees
Event Trees
Obtaining Component Reliability
Monte Carlo Method
Case Study: Heat Shield Reliability Modeling
Summary
Austin Howard
34
Heat Shields 101
Kinetic Energy:
1
mV 2 +Potential Energy: mgdy Thermal Energy (hot)
2
Entry velocities between 7km/s(LEO)-11km/s (Lunar return),
Altitude ~400 km (+ for lunar return)
Blunt body advantage
Shuttle vs Apollo
Austin Howard
35
Cont…
Apollo
After
Before
Shuttle
Austin Howard
36
Case Study Objectives
Risk Assessment Objectives For Orion Heat Shield:
Obtain an estimation of the overall system reliability
Identify components/events most likely to cause failure
Identify sub-systems that may be too conservative
Determine sensitivity of design/modeling/testing/environmental
parameters on system reliability
Determine where resources should be allocated in order to
reduce risk most efficiently
Austin Howard
37
Failure Modes
TPS Failure Modes
Burnthrough of heat shield material
Crack
Damage
De-bonding
Hot spots
Flowthrough
Bondline overheat
Excessive conduction
Radiation absorption
System interface failure
e.g. electromagnetic interference, landing system interference
Austin Howard
38
The Software Used
SAFE – Space Architecture Failure Evaluation
Code in development at NASA Ames
Monte Carlo Simulation method
Input
Assembly architecture
Nominal reliabilities of components and events
Consequences of failure
Mission outline (events and segments)
The software generates hundreds or thousands of semirandom repetitions of the given scenario
The output
Histograms and mission summaries that engineers can use to
determine when the system is likely to fail, what will cause failure,
and how often system failures are likely to occur…
Austin Howard
39
Simple Example
Austin Howard
40
Risk Interaction Example
Micro-Meteoroid and Orbital Debris
(MMOD)
Risk of significant sized particles hitting heat
shield with significant velocity to cause
damage
Risk of the MMOD damage
causing/contributing to TPS failure
Austin Howard
41
Another Example
Environment modeling
Accurately predicting entry environment
Recession modeling based on predicted
environment
Material selection/Thickness design based
on recession modeling
Austin Howard
42
Organizing the Risks
Austin Howard
43
Visualizing Risk Interaction
Austin Howard
44
Calculating Risk Values
Austin Howard
45
The Model
Austin Howard
46
Predicting Reliability
Historical records
Apollo
Shuttle
Others
Physics based
simulation tools
Testing
Ground Tests
Flight Tests
Austin Howard
47
Results of Summer Work
Reliability model:
Incorporates over 90 potential TPS risks
Each risk can fail in either a benign or catastrophic manner
Multiple benign failures have the ability to contribute to a
catastrophic failure
All pre-entry factors influence risks during entry and landing
phases
Risk Analysis Document
Outline for detailed sub-system interaction
Can be used to track changes and understand model
Can be used to help understand risk dependence on material
choice and other design factors
Austin Howard
48
Outline
Introduction
Failure Mode Effect Analysis (DFMEA)
Fault Trees
Event Trees
Obtaining Component Reliability
Monte Carlo Method
Case Study: Heat Shield Reliability Modeling
Summary
Austin Howard
49
Summary
Risk analysis is a large topic that describes an
entire discipline of engineering
Risk analysis is an iterative process
If used correctly, can save money, and lives!
Can aid in decision making process, justify actions
There are lots of tools available for engineers
Austin Howard
50
Cont…
The output of a risk assessment is only as good
as the input
The engineer must have plenty of test data or a
sound model before a valid risk model can be
produced
Model output is meaningless without bounds on
the solution
Austin Howard
51
Questions?
Austin Howard
52