Revisit the Binomial Distribution…
Download
Report
Transcript Revisit the Binomial Distribution…
EVSC 495/EVAT 795
Data Analysis & Climate
Change
Instructor:
Michael E. Mann
Class hours: TuTh 2:00-3:15 pm
EVSC 495/EVAT 795 WEBPAGE
http://holocene.evsc.virginia.edu/~mann/COURSES/495homeFall04.html
•SYLLABUS
•LECTURES
•COURSE INFORMATION
•PROBLEM SETS
Although there is no ideal textbook for the class,
the following book is helpful as supplementary
material (two copies are available on reserve in
the Science & Engineering library):
Statistical Methods in the Atmospheric
Sciences, D. Wilks, 1995 (Academic Press)
LECTURE 1
INTRODUCTION
Hypothesis testing; Probability; Distributions
Supplementary Readings: Wilks, chapters 1-4
Applications of statistics to the study of
climate variability and climate change
•Comparing theoretical predictions and
observations
•Detecting statistically significant trends
•Characterizing spatial and temporal
patterns of variation
Applications of statistics to the study of
climate variability and climate change
The
instrumental
Combined
surface
global land
temperature
air andissea
record
not
surface
perfect.
temperatures
1860-1997
(relative to
1961-1990
average)
Global Temperature Trends
Applications of statistics to the study of
climate variability and climate change
The
instrumental
surface
temperature
record is not
perfect.
Note how
much sparser
the data is prior
to prior to the
20th century...
Grayshade: 1902-1993
Checkerboard: 1854-1993
Applications of statistics to the study of
climate variability and climate change
The
instrumental
surface
temperature
record is not
perfect.
Note how
much sparser
the data is prior
to the early
19th century...
Global Temperature Trends
Applications of statistics to the study of
climate variability and climate change
Global Temperature Trends
Statistical
methods can
be used to
estimate the
associated
uncertainty
Applications of statistics to the study of
climate variability and climate change
Recent history
of ENSO
phenomenon
Multivariate
ENSO Index
(“MEI”)
Applications of statistics to the study of
climate variability and climate change
For the
hemisphere on
the whole, the
warming or
cooling due to
the NAO is
probably a zerosum game (note
that cooling is
expected cooling
over Greenland
and most of
Arctic sea,
where no data is
available
“Explains” enhanced warming in certain
regions of Northern Hemisphere in past
couple decades
Hypothesis Testing
Test Statistic
Null Hypothesis (H0)
Alternative Hypothesis (HA)
Hypothesis Testing
Statistical Model
[Observed Data] = [Signal] + [Noise]
“noise” has to satisfy certain properties!
If not, we must iterate on this process...
Bayesian/Subjectivist vs
Frequentist Approach
Frequency Approach
|a / N Pr{E}|0
as
N
(ie, fraction occurances/opportunities of an
event converges to the probability of the event)
Bayesian/Subjectivist vs
Frequentist Approach
Bayesian Approach
Conditional Probability
2
Pr E1 | E
(Probability of E1 given the occurrence of E2)
Consider Mutually Exclusive and Collectively Exhaustive
(MECE) set of events {Ei} and an event A
Pr A iI 1Pr{A| Ei}Pr{Ei}
Bayesian/Subjectivist vs
Frequentist Approach
Bayesian Approach
Bayes’ Theorem
Pr A| E Pr E
i
i
PrEi | A
PrA
Pr A iI 1Pr{A| Ei}Pr{Ei}
Bayesian Approach
Bayes’ Theorem
Posterior Distribution
Likelihoods
Prior
Pr A| E Pr E
i
i
PrEi | A
J Pr{A| E }Pr{E }
j 1
j
j
Bayesian Approach
The central equation of Bayesian statistics combines
the prior distribution and the likelihood function to reach
the posterior distribution:
Bayes’ Theorem
Posterior Distribution
Likelihoods
Prior
Pr A| E Pr E
i
i
PrEi | A
J Pr{A| E }Pr{E }
j 1
j
j
Coin Flipping Example
(Probability Of Heads)
Binomial Distribution
N n
N
N
!
N
n
P (n) p (1 p)
i
n!( N n)!
n
n
Consider probability of obtaining
seven heads in ten flips:
N=10; n=7
What is “p”?
P(7)=0.12
Coin Flipping Example
(Probability Of Heads)
Binomial Distribution
N n
N
N
!
N
n
P (n) p (1 p)
i
n!( N n)!
n
n
Consider probability of obtaining
seven heads in ten flips:
What if the coin is weighted?
How does the frequentist deal with this issue?
Coin Flipping Example
(Probability Of Heads)
Binomial Distribution
N n
N
N
!
N
n
P (n) p (1 p)
i
n!( N n)!
n
n
Weighted coin
Np nq) p N1 n q 1
(
P (Px)(
p)
p (1xp) (1 x)
i i ( np)(q)
Coin Flipping Example
(Probability Of Heads)
Beta
Distribution
Weighted coin
(
p
q
)
p
1
q
1
P ( x)
x
(1 x)
i
( p)(q)
Coin Flipping Example
(Probability Of Heads)
Probability density function
N n
P ( p) p (1 p) N n
i
n
Bayesian analysis of a set of coin flips.
The prior density was calculated assuming
20 heads from 40 tosses for a perfect coin
(p = 0.5). The likelihood or data density
was calculated assuming 7 heads from 10
tosses. The resulting posterior density is
also plotted
qp
Coin Flipping Example
(Probability Of Heads)
The posterior distribution for
Pr(heads) peaks just above 0.5
because of the observed data of
7 heads from 10 tosses. The
extent of the shift from the data
value (0.7) is incorporated into
the analysis by the form of the
prior distribution. In any case, as
the number of observations, n,
increases, the resulting
distribution, becomes more
concentrated at the observed
ratio of heads to tosses.
qp
Probability Distributions and PDFs
•Binomial distribution
•Poisson distribution
•Gaussian distribution
•Chi-squared distribution
•Lognormal distribution
•Gamma Distribution
•Beta Distribution
Binomial Distribution
N n
N
N
!
N
n
P (n) p (1 p)
i
n!( N n)!
n
n
Describes the probability distribution of
multiple independent events characterized
by a fixed rate of occurrence
•Coin Flipping
•Dice Rolling
•Precipitation Occurrence?
Binomial Distribution
N n
N
N
!
N
n
P (n) p (1 p)
i
n!( N n)!
n
n
Now consider the limit where p<<1 and N>>1
Under these circumstances, we have the
approximation:
1
P (n) ne
i
n!
pN
(occurrence rate)
“Poisson” Distribution
Histogram
Estimate of PDF
VAR()
1
P (n) ne
i
n!
pN
(mean occurrence rate)
Revisit the Binomial Distribution…
N n
N
N
!
N
n
P (n) p (1 p)
i
n!( N n)!
n
n
Let s=n-N/2
N
N!
n (n / 2 s)!(n / 2 s)!
N
ln ln N!log[( N / 2 s)!] log[( N / 2 s)!]
n
Revisit the Binomial Distribution…
N n
N
N
!
N
n
P (n) p (1 p)
i
n!( N n)!
n
n
Now consider the limit where N>>1 (but p finite)
We now can make use of Stirling’s approximation:
m! (2m)1/ 2mm exp(m)
ln m!1/ 2ln(2 ) (m 1/ 2)ln m m
Revisit the Binomial Distribution…
N
ln ln N!ln[( N / 2 s)!] ln[( N / 2 s)!]
n
N
ln 1/ 2ln(2 / N ) N ln 2 2s2 / N
n
N
(2 / N )1/ 2 2 N exp(2s2 / N )
n
m! (2m)1/ 2mm exp(m)
ln m!1/ 2ln(2 ) (m 1/ 2)ln m m
Revisit the Binomial Distribution…
N
ln ln N!ln[( N / 2 s)!] ln[( N / 2 s)!]
n
N
ln 1/ 2ln(2 / N ) N ln 2 2s2 / N
n
N
(2 / N )1/ 2 2 N exp(2s2 / N )
n
N
(2 / N )1/ 2 2 N exp[2(n N / 2)2 / N ]
n
N
(2/ N )1/ 22N exp[2(x )2 / 2]
x
Gaussian or “Normal” Distribution
N n
N
N
!
N
n
P (n) p (1 p)
i
n!( N n)!
n
n
N
(2/ N )1/ 22N exp[2(x )2 / 2]
x
2
1
1
x
P ( x)
exp
i
2
2
Gaussian or “Normal” Distribution
Z
2
1
1
x
P ( x)
exp
i
2
2
Gaussian or “Normal” Distribution
El Nino, La Nina, and ‘La Nada’
1998 GLOBAL
TEMPERATURE PATTERN
Heavily influenced by a huge El Nino
NINO3 (90-150W, 5S-5N)
Mean
Standard Deviation
2
1
N
N
1
x i1 xi s
xi x
N
N 1 i 1
Variance
s2
NINO3 (90-150W, 5S-5N)
Histogram of Monthly Nino3 Index
Goodness of fit?
Topic of next lecture…