Random Variables & Disributions
Download
Report
Transcript Random Variables & Disributions
Random Variables &
Probability Distributions
Outcomes of experiments are, in part,
random
E.g. Let X7 be the gender of the 7th randomly
selected student.
In this case, the sample space is S={M,F}
Probability distributions used to
understand, model, and predict outcomes
of random experiments.
Many useful distributions for describing
random processes in environmental
science & mgt.
Example: Hazardous Waste
Hazardous Waste Depository: test
wells-monitor groundwater for leaks.
Aldicarb limit = 30 ppb
Aldicarb occurs naturally (but
concentration is variable).
What is probability of exceeding
limit even if no leak? (Prob
measuring > 30 even if no leak?)
Evidence and Data
“Natural” distribution of aldicarb:
500 readings from sites known to not
be contaminated:
Evidence Cont’d
Based on this distribution, we will
assume these data are normally
distributed with:
Mean = 20 ppb
Standard Deviation = 4 ppb
Definitions
Random Variable: the unknown outcome
of an experiment. The particular
outcome is a realization of the random
variable.
E.g. (1) rain Tues., (2) aldicarb measurement
r.v. takes diff. values each w/ diff. probs.
Histogram: plot of the frequency of
observation of a random variable over
discrete intervals.
Discrete vs. Continuous Random Variable
Frequency of Outcomes
Probability Density (Mass) Function:
Histogram of outcomes resulting from
infinite # samples: (Prob = area under)
For cont., bar width approaches 0
Cumulative Distribution Function:
Probability that the r.v. x.
Examples on board:
# Grizzly cubs per sow (1,2,1,2,2,2,2,3,1,2)
• Histogram vs. known prob. mass (.13, .70, .17)
Natural aldicarb concentration
• Histogram (of data) vs. pdf N(20,4)
Known vs. Unknown
Distributions
True distribution may not be a known
distribution (e.g. dist’n of student’s
heights in this classroom)
Often, knowing how a process works will
point us to a particular (known)
distribution
Advantages of known distributions:
Can usually be described by 1 or 2
parameters.
Well studied, so most properties known
• Easy to ask questions like the aldicarb question.
Discrete Random Variables
1.
2.
3.
4.
5.
Bernoulli: 2 outcomes: “success”
(prob,= p) or “failure” (prob.= 1-p)
Binomial: Number of successes in n
independent Bernoulli trials.
Multinomial: Extends Binomial to more
than 2 outcomes.
Geometric: Number Bernoulli trials until
first success.
Poisson: Counting r.v. (takes integer
values). Number events that occur in
given time interval.
Normal Random Variable
1.
Normal: “Bell Shaped”, “Gaussian”.
Symmetric. + and – values.
1. Central Limit Theorem: Sum or Avg.
of several independent r.v.’s, result is
normal (often used as justification for
Normal).
2. “Standard Normal”: N(0,1).
3. Convert X~N(m,s) to Standard Normal
(Z):
Z=(X-m)/s
Continuous Random
Variables
1.
2.
3.
4.
5.
6.
Uniform: every possible outcome equally
likely (also a discrete r.v.)
Log-Normal: r.v. whose logarithm is
normally distributed.
Gamma: Non-negative values.
Extreme Value: Maximum or minimum of
many draws from some other distribution.
Exponential: Inter-arrival times,
“memoryless”.
c2: Closely related to Normal. Nonnegative. Skewed.
Answer
Question: What is probability that
measured aldicarb level 30 ppb, if
no leak?
Let X be a random variable
describing the aldicarb level of a
given test.
P(X 30) = area under N(20,4)
above 30 ppb.
Integrate Under N(20,4)
Normal pdf:
1 x m
1
f (x)
exp
2 s
s 2
Draw
Isn’t
way?
on board…Ouch!
there another
2
2 Ways to Answer
1.
2.
Ask S-Plus (nicely): P(X<30)=0.994,
so P(X>30)=0.006.
Convert to N(0,1).
1.
Standard Normal Z=(30-20)/4=2.5.
Table gives Pr(0<Z<z):
z
.00
.01
.02
.03
0.0
.000
.004
.008
.012
0.3
.118
.122
.126
.129
1.8
.464
.465
.466
.466
2.5
.494
.494
.494
.494
Answer
Pr(X>30) when X~N(20,4) =
Pr(Z>2.5) when Z~N(0,1)
Pr(0<Z<2.5)=.494
Pr(-<Z<0)=.500
So, Pr(Z>2.5)=1-.494-.5 = .006