1. Specify the test statistic t test

Download Report

Transcript 1. Specify the test statistic t test

Standard Normal Distribution
• μ=0 and σ2=1
5000
4000
3000
2000
1000
0
-6
-4
-2
0
2
4
Confidence Intervals
• Scientists often use a sample standard
deviation to construct a confidence
interval around the mean.
• For a normally distributed random
variable:
approximately 67 % of the observations occur within + 1
standard deviation of the mean
approximately 96 % of the observations occur within + 2
standard deviations of the mean
What does it mean?
P(Y  1.96sY    Y  1.96sY )  0.95
where sY  s
n
Because our sample mean and sample standard error of
the mean are derived from a single sample,
this
confidence interval WILL change if we
sample again. Thus, this expression asserts that
the “true” population mean μ will fall within a single
calculated confidence in 95% of the iterations
• By extension:
• If we were to repeatedly sample the
population (keeping sample size and all
conditions equal), 5% of the time we would
expect that the true population mean μ
would fall outside of this confidence
interval
Interpretation
“There is a 95% chance that the
true population mean μ occurs
within this interval.”
WRONG

“95% of the realizations, a
confidence interval calculated in
this way will contain the “true”
value of μ.”
Right

Bad news
• This is not satisfying!!!!
• It is not exactly what you like to assert
when you construct a confidence interval!!
• You would like to say how confident you
are that the confidence interval contains
the population mean
• A frequentist statistician, however, can’t
assert that !!!!
Good news
• A Bayesian approach turns this around.
Because the confidence interval is fixed
(by your sample data), a Bayesian
statistician can calculate the probability
that the population mean (itself a random
variable) occurs within the confidence
interval.
• Bayesians refer to this as:
Credibility intervals
More news
• Bayesian credibility intervals and frequentist
confidence intervals are usually numerically
similar if the Bayesian prior probability
distribution is uninformative.
• Note that:
– When the intervals are identical, the choice does not
matter.
– When the intervals are different, only the Bayesian
approach provides logical results.
T-distribution
Xk  
t
sX k
P(Y  t [n1]sY    Y  t [n1]sY )  (1   )
Generalized Confidence Intervals
Some t-distributions:
8 df
1400
4 df
1200
1000
800
1400
600
1200
400
1000
200
800
0
-5
-4
-3
-2
-1
0
1
2
3
4
5
600
400
200
0
-5
-4
-3
-2
-1
0
1
2
1400
1200
1000
800
20 df
600
400
200
0
-5
-4
-3
-2
-1
0
1
2
3
4
5
3
4
5
t- distribution
http://www.statsoft.com/textbook/sttable.html#t
Calculating the tail probability:
Student’s t table
df\p
0.4
0.25
0.1
0.05
0.025
0.01
0.005
0.0005
1
0.32492
1
3.077684
6.313752
12.7062
31.82052
63.65674
636.6192
2
0.288675
0.816497
1.885618
2.919986
4.30265
6.96456
9.92484
31.5991
3
0.276671
0.764892
1.637744
2.353363
3.18245
4.5407
5.84091
12.924
4
0.270722
0.740697
1.533206
2.131847
2.77645
3.74695
4.60409
8.6103
5
0.267181
0.726687
1.475884
2.015048
2.57058
3.36493
4.03214
6.8688
6
0.264835
0.717558
1.439756
1.94318
2.44691
3.14267
3.70743
5.9588
7
0.263167
0.711142
1.414924
1.894579
2.36462
2.99795
3.49948
5.4079
8
0.261921
0.706387
1.396815
1.859548
2.306
2.89646
3.35539
5.0413
http://www.statsoft.com/textbook/sttable.html#t
Why use R?
240
Articles found
• Search of Google Scholar from
2002-2008 with the search
phrase “R Development” and
Publication Name of “Ecology*
or Evolution*”
200
160
120
80
40
0
2002 2003 2004 2005 2006 2007 2008
Year
Why learn programming?
• “One of the most important things you can do is
to take the time to learn a real programming
language…
• Unfortunately, learning to program is like
learning to speak a foreign language—it takes
time and practice, and there is no immediate
payoff…But if you can overcome the steep
learning curve, the scientific payoff is
tremendous [emphasis added].”
• Excerpted from footnote on p. 116 of Gotelli & Ellison (2004)
Why learn programming? (cont.)
• “In offering advice to graduate students in almost
any branch of ecology, one of the most
important recommendations is to acquire at least
some programming skills.”
Excerpted from p. 320 of Fortin, M.-J. & M. Dale. 2005.
Spatial Analysis: A Guide for Ecologists. Cambridge
University Press.
NOTE: Gotelli, Ellison, Fortin, and Dale are all field
ecologists, not “just theoretical modelers”!