Modeling_Uncertainty_2 - Analytica Wiki

Download Report

Transcript Modeling_Uncertainty_2 - Analytica Wiki

Modeling Uncertainty:
Probability Distributions
Lonnie Chrisman, Ph.D.
Lumina Decision Systems
Analytica User Group Webinar Series
Session 2: 6 May 2010
Copyright © 2010 Lumina Decision Systems, Inc.
Today’s Topics
• Review
• How can we characterize uncertainty
for continuous quantities?
• The Normal Distribution
Viewing & interpreting
• LogNormal Distribution
• Why include uncertainty
Copyright © 2010 Lumina Decision Systems, Inc.
Course Syllabus
(tentative)
Over the coming weeks:
• What is uncertainty? Probability.
• Probability Distributions (today)
•
•
•
•
•
Monte Carlo Sampling
Measures of Risk and Utility
Common parametric distributions
Assessment of Uncertainty
Risk analysis for portfolios
(risk management)
• Hypothesis testing
Copyright © 2010 Lumina Decision Systems, Inc.
Review
Copyright © 2010 Lumina Decision Systems, Inc.
What is Uncertainty?
• Uncertainty: the lack of perfect and
complete knowledge.
• Applies to:
Future outcomes
Existing states or quantities
Physical measurements
Unknowable (quantum mechanics)
• Exercise: State something that you have
perfect and complete knowledge of.
Copyright © 2010 Lumina Decision Systems, Inc.
Related Concepts
• Randomness
Will by next coin toss be heads or tails?
• Variation
75% of the people in this room have type A blood.
• Vagueness
How many people worldwide live in warm climates?
• Risk
You could die during the operation.
• Statistical Confidence/Significance
The study confirmed the hypothesis at a 95% confidence
level.
Copyright © 2010 Lumina Decision Systems, Inc.
Probability:
A language for uncertainty
Probability: A measure for how certain, on a
scale from 0 to 1, a statement is to be true.
•
•
•
•
P(A)=0 : Assertion A is certainly false.
P(A)=1 : Assertion A is certainly true.
P(A)=0.5: Equally likely to true or false.
P(A)=0.7: A is more likely true than false.
Copyright © 2010 Lumina Decision Systems, Inc.
Assertions must be
Crisp and Unambiguous
Probability of what?
• Must be a true/false assertion.
• Vagueness not allowed.
✘ “Gas prices will increase substantially in the
short term.”
✔ “The average retail price for regular unleaded
gas in the state California, as reported by the
U.S. Energy Information Administration, will
increase by more than 20% from 26 Apr 2010
to 30 Aug 2010.”
• Truth theoretically knowable
Copyright © 2010 Lumina Decision Systems, Inc.
Boolean Chance Variables
in Analytica
• Characterized by a single probability –
P(B=true).
• Examples:
Component fails
Dow drops by >1000 points
Civil war breaks out in Nigeria
Subject is male
• Use Chance variable defined as
Bernoulli(p)
Copyright © 2010 Lumina Decision Systems, Inc.
“Subjective” Interpretation
of Probability
• Probabilities measure:
how much what we know.
not frequency of occurrence.
• Calibration:
Over many probability assessments, the
frequency of true assertions should match
our subjective probabilities for the
assertions.
Copyright © 2010 Lumina Decision Systems, Inc.
Today’s New Topics
Copyright © 2010 Lumina Decision Systems, Inc.
Continuous Quantities
• Most variables in quantitative models
represent real-valued quantities.
Examples:
Revenue
Infection rate
Oil well capacity
Megawatt power output
Unit sales (?)
• Saying “Probability of x”, or P(x), is
nonsensical.
• We need something more…
Copyright © 2010 Lumina Decision Systems, Inc.
Real-valued uncertainty example
At this time (6 May 2010), at what rate (in
gallons per hour) is oil leaking into the
Gulf of Mexico from the well in Louisiana
that exploded on 22 Apr 2010?
• Does this pass the clarity test?
• How can we express or knowledge and
degree of uncertainty regarding the
true value?
Note: A CNN article gave an estimate of 8,300 gal/hr.
Copyright © 2010 Lumina Decision Systems, Inc.
Ways to Expressing Uncertainty
(Attendees ideas)
Rate of Oil leak:
• Minimum & maximum values
• Standard deviation
• Mean + Median (if different)
• Distribution, e.g, triangular with 10% +
90% percentiles.
Copyright © 2010 Lumina Decision Systems, Inc.
Average Deviation
Suppose our “best guess” is:
E[ oil_leak_rate ] = 10K gal/hr
•
What is the expected error in our estimate?
= E[ |10K – trueValue| ]
•
Ave. dev. is a simple (intuitive?) one-number
measure of how uncertain we are.
Allows us to characterize our knowledge / uncertainty
with just two numbers:
Expected value + Expected deviation
Aka: Expected Deviation, (mean/average) Absolute deviation.
Copyright © 2010 Lumina Decision Systems, Inc.
Standard Deviation
• Other measures of uncertainty “dispersion”:
Variance (expected/average squared error):
= E[ (10K – trueValue)2 ]
Standard Deviation
=
2
Variance

E [( 10 K  trueValue ) ]
• Standard deviation has the same intuitive
meaning as average (absolute) deviation.
Both are a type of best guess for how much error
our best guess has.
Nicer mathematical properites
More commonly used.
Copyright © 2010 Lumina Decision Systems, Inc.
Standard Deviation vs.
Average Deviation
sd 

2
E [( x  x *) ]
vs

ad  E [| x  x * |]
• Both are always non-negative.
• Zero indicates absolute certainty.
• Both are measured in the same units as x.
• Q: Which measure gets larger when extreme errors
are more likely?
• What is the typical ratio sd/ad?
Symmetric: sd ≈ 1.25 ad
One-sided tail: sd ≈ 1.35 ad
“Heavy” tails: (up to) 1.3 ad ≤ sd ≤ 2.5 ad
Copyright © 2010 Lumina Decision Systems, Inc.
Expressing uncertainty
for a real-valued quantity
• Expected value + dispersion measure, e.g.:
Expected value + average deviation
Expected value + standard deviation
• Exercise: Express your uncertainty for the oil
well leak example in the above forms.
• There are no probabilities here. Why?
Copyright © 2010 Lumina Decision Systems, Inc.
Visualization
Normal Distribution
Expected
Value
EV=10K
AD=3K
SD =3.8K
This is called a
probability density
function (PDF) plot.
Ave
dev.
Std
dev.
Copyright © 2010 Lumina Decision Systems, Inc.
Visualization
Normal Distribution
EV=10K
AD=3K
SD =3.8K
+/- Ave
Deviation
58% of area
within 1 average
deviation.
The connection
to probability.
Copyright © 2010 Lumina Decision Systems, Inc.
Visualization
Normal Distribution
EV=10K
AD=3K
SD =3.8K
+/- Std
Deviation
68% of area
within 1 average
deviation.
Copyright © 2010 Lumina Decision Systems, Inc.
Cumulative Probability Function
(CDF)
• Easier to read than PDF.
• P(rate≤x)
Copyright © 2010 Lumina Decision Systems, Inc.
Specifying the Normal
Distribution in Analytica
• Define your real-valued variable as:
Normal( mean, stddev )
Take note: Standard Deviation, not
expected/average deviation.
Remember to increase slightly (e.g., 25%)
when estimating.
Copyright © 2010 Lumina Decision Systems, Inc.
Exercise
A toy company must decide how many
toys to manufacture for the Christmas
season three months in advance.
Demand is: Normal(100K,25K)
It costs $5 to manufacture a toy. The
company makes a $10 profit on each
toy sold.
They order 100K toys. What is their
expected profit?
Copyright © 2010 Lumina Decision Systems, Inc.
Exercise <cont>
Using the toy company example:
• Compare estimated profit when
uncertainty is ignored (based on Mean
demand) to mean profit.
• Examine how mean profit varies with
the number of toys ordered:
Units_ordered := Sequence(70K,130K,1K)
• What size order should they place?
• What improvement in value results from
including explicit uncertainty in the model?
Copyright © 2010 Lumina Decision Systems, Inc.
Positive real-valued quantities
• Many real-valued quantities are
positive-only, but no hard upper limit:
Oil leak rate
Demand
Population counts
Stock prices
Multiplier for positive quantity
Capacities
• Normal distribution allows negative
values.
Copyright © 2010 Lumina Decision Systems, Inc.
Nonsense negatives
Negative oil leak?
Nearly impossible?
Copyright © 2010 Lumina Decision Systems, Inc.
LogNormal Distribution
Mode
Median
Mean
• Positive values only.
• Positive skew (most values to right of mode)
• Multiple possible “central” estimates.
Copyright © 2010 Lumina Decision Systems, Inc.
Specifying a LogNormal
LogNormal(median,gsdev,mean,stddev)
• You specify any two of these:
Median: 50th percentile – “typical value”
Mean: Average value
Gsdev: geometric standard deviation
Stddev: (Arithmetic) standard deviation
• When using LogNormal, use namedparameter syntax, e.g.:
LogNormal(mean:10K,stddev:3.8K)
LogNormal(median:9350,mean:10K)
Copyright © 2010 Lumina Decision Systems, Inc.
Exercise
A mining company obtains rights to extract a gold
deposit during a one-week window next year, before
a construction project starts on the site.
Extracting the deposit will cost $900K.
The size of the deposit:
LogNormal(Mean:1K,Stddev:300) oz.
The price of gold next year:
LogNormal(Mean:$1K, stddev:$500)
What is the expected value of these mining rights?
Compare to result ignoring uncertainty.
Copyright © 2010 Lumina Decision Systems, Inc.
How important is choice of
distribution?
Exercise:
• Modify mining example to use Normal
instead of LogNormal, same mean &
stddev.
• How much does this change the result?
Copyright © 2010 Lumina Decision Systems, Inc.
Compare Normal to LogNormal
These have the same mean and same standard deviation.
Copyright © 2010 Lumina Decision Systems, Inc.
The Flaw of Averages
Who is this guy?
A: Sam Savage, author of:
An entertaining account of the distortions
caused by average-case analysis.
Copyright © 2010 Lumina Decision Systems, Inc.
Why model uncertainty explicitly?
• Misleading results otherwise… “Flaw of averages”
• Explicit “precision” of results.
• Some decisions are about uncertainty. E.g.,
to gather more information
contingency planning
• Improved combining of information sources.
• Productivity: Probabilities & distributions can often
be estimated more quickly than expected values (!)
• Sensitivity analyses
• Causal modeling & abduction (diagnostic reasoning)
Copyright © 2010 Lumina Decision Systems, Inc.
What we covered
• Uncertainty about continuous quantities can
be largely characterized by:
Central value (e.g., mean or median)
Dispersion measure (expected deviation,
standard deviation, variance, geometric standard
deviation).
• Normal distribution – unbounded quantities
• LogNormal distribution – positive quantities
Copyright © 2010 Lumina Decision Systems, Inc.