EAS31116_Lec3_statDi..

Download Report

Transcript EAS31116_Lec3_statDi..

EAS31116/B9036: Statistics in Earth & Atmospheric Sciences
Lecture 3: Probability Distributions (cont’d)
Instructor: Prof. Johnny Luo
www.sci.ccny.cuny.edu/~luo
Project Dates
Oct 2: 1-page abstract due
Nov 6: Progress report due
Dec 4: presentation of the project
Dec 11: Project Report due
Outlines
1. Definition of Terms
1. Some Empirical & Exploratory Data Analysis
2. Parametric Distribution I: Discrete Distributions
3. Parametric Distribution II: Continuous Distributions
4. Assessments of the Goodness of Fit
 Random Variable: when the value of a variable is the
outcome of a statistical experiment (i.e., uncertain and
dependent on chances), it is called a random variable.
 Probability Distribution: Probability (remember: it
means a ratio) assigned to values of a random variable.
 Empirical Probability Distribution: just describe what’s
been observed – an exploratory approach
 Parametric Probability Distribution: summarize the
observed probability distribution using particular
mathematical forms.
Jan 1987 Ithaca Precip/T
Describe data in n-quantiles
2-quantiles: Median
3-quantiles: Terciles
4-quantiles: Quartiles
…
100-quantiles: Percentiles
Step 1: rank the data in ascending
(or descending) order
Step 2: find cutoff values for equal
size subgroups
Box-and-whisker plots
Histograms
Histograms of the Jan Max
temperature in Ithaca.
In Matlab, the function hist will
plot histogram of a data array.
Discrete Distribution I: Binomial Distribution
Definition: A sequence of n independent yes/no (or head/tail)
experiments. Usually we use 1 to represent yes and 0 for no.
Random Variable (X): number of yes (or head) in a sequence
of n trials.
If you flip coin 3 times, what are possible values for X?
X = 0, 1, 2, 3. Think-Pair-Share: What are the probabilities
(remember: probability is a ratio) of all these four possible
outcomes?
For N=3, possible X = 0, 1, 2, 3. What are the probabilities
of all these four possible outcomes?
0 (all 3 tails):
1 (1 head/2 tails):
2 (2 heads/1tail):
3 (all 3 heads):
1/8
3/8
3/8
1/8
Binomial distribution is a discrete parametric distribution
with two parameters:
1) N (total # of experiments)
2) p (probability for yes at each trial)
More Generally,
“N choose x”: different ways of
distributing x successes in a
sequence of N trials
Application in Earth Sciences
220-yr record
Q1: Compute the
probability of the lake
freezing next winter
(or in any single
winter in the future)
Step 1: Find the two
parameters in the binomial
distribution
x = 1 (one freeze)
N = 1 (only one future year)
p = 10/220 = 0.045
Step 2:
This is trivial!
Pr{X=1}
= (1!)/(1! 0!) (0.045)(1-0.045)1-1
= 0.045
Application in Earth Sciences
220-yr record
Q2: Compute the
probability of the lake
freezing once in 10
years.
Step 1: Find the two
parameters in the binomial
distribution
x = 1 (one freeze)
N = 10
p = 10/220 = 0.045
Step 2:
Pr{X=1}
= (10!)/(1! 9!) (0.045)(1-0.045)10-1
= 0.30
Application in Earth Sciences
220-yr record
Step 1: Look for the
complement event x = 0 (no
freezing at all in 10 years)
x = 0 (no freeze);
N = 10; p = 10/220 = 0.045
Q3: Compute the
probability of the lake
freezing at least once
in 10 years.
Step 2:
Pr{X=0}
= (10!)/(0! 10!) (0.045)0(1-0.045)10
= 0.63
Pr{X=1} = 1 – 0.63 = 0.37
A sequence of N yes/no experiments
Flipping a coin
20 times
Flipping a cheating
coin 20 times
Flipping a coin
40 times
Source: wikipedia
Discrete Distribution II: Poisson Distribution
The Poisson Distribution
describes the probability of
a given number of events
occurring in a fixed interval
of time.
For example, number of
email you receive each day,
or number of tornadoes
reported in New York State
each year. (This is no
longer a yes/no
experiment.)
μ
μ
μ
Discrete Distribution II: Poisson Distribution
The Poisson Distribution
describes the probability of
a given number of events
occurring in a fixed interval
of time.
For example, number of
email you receive each day,
or number of tornadoes
reported in New York State
each year. (This is no
longer a yes/no
experiment.)
The Poisson Distribution only has one
parameter: μ (called intensity; it
happens to be the mean value). X is
the random variable.
μ
μ
μ
Consider the annual tornado counts in NYS for 1959–1988, in
Table 4.3. During the 30 years covered by these data, 138
tornados were reported in New York state. The average, or
mean, rate of tornado occurrence is 138/30 = 4.6 /year
Consider the annual tornado counts in NYS for 1959–1988, in
Table 4.3. During the 30 years covered by these data, 138
tornados were reported in New York state. The average, or
mean, rate of tornado occurrence is 138/30 = 4.6 /year
The Poisson
distribution fits
data fairly well (we
will learn how to
do the fitting later
in class).
Expected Value of a Random Variable
The expected value of a random variable or function of a
random variable is simply the probability-weighted average
of that variable or function.
For example, flip coin 3 times, N = 3, p=0.5, E[X] = 1.5 (in
between one head and two heads)
Expected value:
Variance:
Outlines
1. Definition of Terms
1. Some Empirical & Exploratory Data Analysis
2. Parametric Distribution I: Discrete Distributions
3. Parametric Distribution II: Continuous Distributions
4. Assessments of the Goodness of Fit
Probability Density Function (PDF): f(x)
Analogous to
histogram. Probability
is represented by the
area under the curve
Probability Density Function (PDF): f(x)
Analogous to
histogram. Probability
is represented by the
area under the curve
Cumulative Distribution Function (CDF): F(x)
Continuous Distribution I: Gaussian Distribution
(aka, Normal distribution)
Two parameters: μ and σ
Why is Gaussian distribution so popular?
Central Limit Theorem: as the sample size gets large, the
sum (or average) of a set of independent observations will
follow a Gaussian distribution.
A lot of quantities in natural science are the result of many
factors superimposed (resembling the sum or average of
these factors)
Histograms of the Jan Max Temp in Ithaca.
They already look somewhat Gaussianlike, although not exactly. If you plot
the distribution of mean max temp. in
Jan (i.e., use multiple years of data), it
will become more Gaussian.
Mean: 0,
standard deviation: 1
Standard Normal Distribution
Z-score (random variable)
Quantiles
PDF and CDF of a Normal Distribution
CDF
PDF
Q1: The mean Jan temperature in Ithaca is 22.20F and σ is 4.40F. In Jan 1987,
the mean Jan temp. is 21.40F. Assume it follows Gaussian distribution. What is
the probability that mean Jan temp. is as cold or colder than Jan 1987?
Q1: The mean Jan temperature in Ithaca is 22.20F and σ is 4.40F. In Jan
1987, the mean Jan temp. is 21.40F. Assume it follows Gaussian
distribution. What is the probability that mean Jan temp. is as cold or
colder than Jan 1987?
z = (21.4 – 22.2)/4.4 = -0.18
What about z in the positive range?
Q2: The mean Jan temperature in
Ithaca is 22.20F and σ is 4.40F.
Assume it follows Gaussian
distribution.
What is the probability that
200F ≤ mean temp. ≤ 250F?
Q2: The mean Jan temperature in
Ithaca is 22.20F and σ is 4.40F.
Assume it follows Gaussian
distribution.
What is the probability that
200F ≤ mean temp. ≤ 250F?
z20 = (20 – 22.2)/4.4 = -0.50
z25 = (25 – 22.2)/4.4 = 0.64
Continuous Distribution II: Gamma Distribution
Sometimes a variable is
constrained by a physical
limit on the left.
For example,
precipitation: it can’t be
lower than zero and it can
go to infinity (in theory).
So, the distribution is not
Gaussian, but skewed to
the right.
Continuous Distribution II: Gamma Distribution
- Random variable: x
-Two parameters:
1) α: the shape parameter,
2) β: the scale parameter.
Γ(α) is the gamma function.
Standard gamma distribution:
Standard gamma distribution:
Q1: suppose Ithaca Jan precip follows
the Gamma distribution with α ≈ 4 and
β = 0.52 inches. For Jan 1987, the
mean precip in Ithaca is 3.15 inches,
use the Table below to find the
percentile value for Jan 1987 precip.
Standard gamma distribution:
Q1: suppose Ithaca Jan precip follows
the Gamma distribution with α ≈ 4 and
β = 0.52 inches. For Jan 1987, the
mean precip in Ithaca is 3.15 inches,
use the Table below to find the
percentile value for Jan 1987 precip.
Step 1: standardize
ξ = 3.15/0.52 = 6.06
Step 2: For α ≈ 4, standard
variable of 6.06 falls in between
the cumulative prob. of 0.80 and
0.90. So, it’s about 0.85.
Outlines
1. Definition of Terms
1. Some Empirical & Exploratory Data Analysis
2. Parametric Distribution I: Discrete Distributions
3. Parametric Distribution II: Continuous Distributions
4. Assessments of the Goodness of Fit
Superimpose the fitted Gaussian and Gamma distribution
curved on the raw histogram (Jan 1987 Ithaca precip)
More will be covered later in class
Binomial
Diff. b/w Binomial & Poisson
distributions
 Binomial predicts number of
successes within a set number
of trials.
Poisson
 Poisson predicts number of
occurrences per unit time,
space, …