USC3002_2008.Lect4 - Department of Mathematics

Download Report

Transcript USC3002_2008.Lect4 - Department of Mathematics

USC3002 Picturing the World
Through Mathematics
Wayne Lawton
Department of Mathematics
S14-04-04, 65162749 [email protected]
Theme for Semester I, 2008/09 : The Logic of
Evolution, Mathematical Models of Adaptation
from Darwin to Dawkins
PLAN FOR LECTURE
1. Populations and Samples
2. Sample Population Statistics
3. Statistical Hypothesis
4. Test Statistics for Gaussian Hypotheses
Sample Mean for Parameter Estimation
z-Test and t-Test Statistics
Rejection/Critical Region for z-Test Statistic
Hypothesis Test for Mean Height
5. General Hypotheses Tests
Type I and Type II Errrors
Null and Alternative Hypotheses
6. Assign Tutorial Problems
POPULATIONS AND SAMPLES
Population - a specified collection of quantities: e.g.
heights of males in a country, glucose levels of a
collection of blood samples, batch yields of an
industrial compound for a chemical plant over a
specified time with and without the use of a catalyst
Sample Population – a population from which samples
are taken to be used for statistical inference
Sample - the subset of the sample population
consisting of the samples that are taken.
SAMPLE POPULATION PARAMETERS
Sample
X 1 ,..., X n
Sample Parameters
Sample Size
Sample Mean
n
n  [ X 1  X 2    X n ] / n
Sample Variance
  [( X 1   n )    ( X n   n ) ] / n
2
n
2
Sample Standard Deviation  n  
2
2
n
SAMPLE POPULATION PARAMETERS
Theorem 1 The variance of a population is related
to its mean and average squared values by
  [X  X  X ]/ n  
2
n
2
1
2
2
2
n
Proof Since ( X k   n )  X  2  n X k  
2
2
k
2
n
n  ( X 1   n )    ( X n   n ) 
2
2
2
X 1   X n  2 n ( X 1    X n )  n n 
2
2
2
2
X 1   X n  2n n  n n  Why ?
2
2
2
X 1  X n  n n
2
n
2
Question How can the proof be completed ?
2
2
n
STATISTICAL HYPOTHESES
are assertions about a population that describe
some statistical properties of the population.
Typically, statistical hypotheses assert that a
population consists of independent samples of a
random variable that has a certain type of distribution
and some of the parameters that describe this
distribution may be specified.
For Gaussian distributions there are four possibilities:
Neither the mean nor the variance is specified.
Only the variance is specified.
Only the mean is specified.
Both the mean and the variance are specified.
TEST STATISTICS
for Hypothesis with Gaussian Distributions
The sample mean for  unknown,  known
n  [ X 1  X 2    X n ] / n
is Gaussian with mean 0 and variance 1/n.
Proof (Outline) We let < Y > denote the mean of a
random variable Y. Then clearly
 n   [ X 1    X 2     X n ] / n  
Independence and Theorem 1 gives variance (  n ) 
2
2
2
 (  n   )    [( X 1   )    ( X n   )]  / n 
n
2
 
n

(
X


)(
X


)



/
n
.
i
j
j 1
n
i 1
where   variance ( X i ), i  1,..., N
2
2
PARAMETER ESTIMATION
for Hypothesis with Gaussian Distributions
The sample mean for
unknown,
known


n  [ X 1  X 2    X n ] / n

since the estimate error  n     n
is unbiased     0
n
can be used to estimate the mean
and converges in the statistical sense that
standard deviation  n   / n  0
as n  
MORE TEST STATISTICS
for Hypothesis with Gaussian Distributions
 ,  known
zn  (n   ) /( / n )
The One Sample z-Test for
is a Gaussian random variable with mean 0,variance 1.
 known,  unknown
tn  (n   ) /( n / n )
The One Sample t-Test for
is a t-distributed random variable with
n-1 degrees of freedom.
z-TEST STATISTIC ALPHAS
z
0
0.2000
0.4000
0.6000
0.8000
1.0000
1.2000
1.4000
1.6000
1.8000
2.0000
2.2000
2.4000
2.6000
2.8000
3.0000
 (z )
0.5000
0.4207
0.3446
0.2743
0.2119
0.1587
0.1151
0.0808
0.0548
0.0359
0.0228
0.0139
0.0082
0.0047
0.0026
0.0013

z ( )
0.0500
0.0400
0.0300
0.0200
0.0100
0.0050
0.0040
0.0030
0.0020
0.0010
0.0005
0.0004
0.0003
0.0002
0.0001
0.0001
1.6449
1.7507
1.8808
2.0537
2.3263
2.5758
2.6521
2.7478
2.8782
3.0902
3.2905
3.3528
3.4316
3.5401
3.7190
3.8906

   p ( x) d x
z
p ( x) 
e
 x2 / 2
2
CRITICAL REGION FOR alpha=0.05
HEIGHT HISTOGRAMS
HYPOTHESIS TEST FOR MEAN HEIGHT
You suspect that the height of males in a country has
increased due to diet or a Martian conspiracy, you aim
to support your Alternative Hypothesis by testing the
  6.509 cm
20  177.115 cm
Null Hypothesis   174.204 cm
You compute a sample mean
using 20 samples then compute
z20  (20  174.204 cm) / (1.4555 cm)  2.000
If the Null Hypothesis is true the probability that
z20  2.000 is   (2)  .0228
Question Should the Null Hypothesis be rejected ?
GENERAL HYPOTHESES TESTS
involve
Type I Error: prob rejecting null hypothesis
if its true, also called the significance level


Type II Error: prob failing to reject null hypothesis
if its false, 1   also called the power of a test,
requires an Alternative Hypothesis that determines
the distribution of the test statistic.
and more complicated test statistics, such as the
One Sample t-Test statistic, whose distribution is
determined even though the distributions of the
Gaussian random samples, used to compute it, is not.
Homework 5. Due Monday 20.10.08
1. Compute the power of a hypothesis test whose null
hypothesis is that in vufoil #13, the alternative
hypothesis asserts that heights are normally distributed
with mean    3.386 cm standard deviation  
where  and  are the same as for the null hypothesis
and 20 samples are used and the significance   .05
Suggestion: if the alternative hypothesis is true, what
is the distribution of test statistic z20  (20   ) /( / 20 )
What is the probability that z 20  z ( ) ?
2. Use a t-statistic table to describe how to test the
null hypothesis that heights are normal with mean 
and unknown variance based on 20 samples.
EXTRA TOPIC: CONFIDENCE INTERVALS
Given a sample mean  n for large n we can assume,
by the central limit theorem that it is Gaussian with
mean   mean of the original population and
2
1
1


variance n
n  variance of the original population.
2
2
Furthermore,    n  sample variance and if the
population is {0,1}-valued  2   (1   )
b
We say that   [ a, b] with confidence c  p( x)dx

a
where p(x) is the probability density of a Gaussian
with mean  n and standard deviation  / n   n / n
Theorem If  is a random variable unif. on [-L,L]
then Bayes Theorem  c  lim p(  | n )
L 
EXTRA TOPIC: TWO SAMPLE TESTS
A null hypothesis may assert a that two populations
have the same means, a special case for {0,1}-valued
populations asserts equalily of population proportions.
Under these assumptions and if the variances of both
populations are known, hypothesis testing uses the
Two-Sample z-Test Statistic z 
~
(  n   n~ )
2
n

~ 2
n~
where  n ,  , n is the sample mean, variance, and
sample size for one population, tilde’s for the other.
For unkown variances and other cases consult:
2
http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
EXTRA TOPIC: CHI-SQUARED TESTS
are used to determine goodness-or-fit for various
distributions. They employ test statistics of the form
d
2
  i 1 (obsi  exp i ) 2 / exp i where obs i are independent
observations & null hyp.  expected value obs i  exp i
and chi-squared distrib. with d-1 degrees of freedom.
Example [1,p.216] A geneticist claims that four species of
fruit flies should appear in the ratio 1:3:3:9. Suppose that the
sample of 4000 flies contained 226, 764, 733, and 2277 flies
of each species, respectively. For alpha = .1, is there
sufficient evidence to reject the geneticist’s claim ?
Answer: The expected values are 250, 750, 750, 2250
2
2
hence   (226  250) / 250    3.27
2
NO since 3 deg. freed. & alpha = .1  6.251
EXTRA TOPIC: POISSON APPROXIMATION
n!
k
nk
The Binomial Distribution B(k ) 
a (1  a)
k!(n  k )!
is the probability that k-events happen in n-trials if
a  prob. that an event happens in one trial
It has mean   na and variance  2  na(1  a)
If a  1 and k  n then
B(0)  (1  n ) n  e  
The right side is the
n!
B(1) 
a (1  n ) n   e   Poisson Distribution
(n  1)!
k
n!

 nk
k

B( k ) 
a (1  n ) 
e
k!(n  k )!
k!
REFERENCES
1. Martin Sternstein, Statistics, Barrows College
Review Series, New York, 1996. Survey textbook
covers probability distributions, hypotheses tests,
populations,samples, chi-squared analysis, regression.
2. E. L. Lehmann, Testing Statistical Hypotheses,
New York, 1959. Detailed development of the
Neyman-Pearson theory of hypotheses testing.
3. J.Neyman and E.S. Pearson, Joint Statistical Papers,
Cambridge University Press, 1967. Source materials.
4. Jan von Plato, Creating Modern Probability,
Cambridge University Press, 1994. Charts the history
and development of modern probability theory.