USC3002_2007.Lect3&4 - Department of Mathematics

Download Report

Transcript USC3002_2007.Lect3&4 - Department of Mathematics

USC3002 Picturing the World
Through Mathematics
Wayne Lawton
Department of Mathematics
S14-04-04, 65162749 [email protected]
Theme for Semester I, 2007/08 : The Logic of
Evolution, Mathematical Models of Adaptation
from Darwin to Dawkins
MOTIVATION
Probability and Statistics play an increasingly
crucial role in evolution research
http://www.springer.com/east/home/life+sci/bioinformatics?
SGWID=5-10031-22-34952257-0
http://www-stat.stanford.edu/~susan/courses/s366/
http://findarticles.com/p/articles/mi_qa3746/is_199
904/ai_n8829021/pg_16
SOURCE OF LECTURE VUFOILS
SP2170 Doing Science
Lecture 3: Random Variables, Distributions,
Inductive & Abductive Reasoning, Experiments
REFERENCES
[1] Rudolph Carnap, An Introduction
to the Philosophy of Science, Dover, N.Y., 1995.
[2] Leong Yu Kang, Living With Mathematics,
McGraw Hill, Singapore, 2004. (GEM Textbook)
(1 Reasoning, 2 Counting, 3 Graphing, 4 Clocking,
5 Coding, 6 Enciphering, 7 Chancing, 8 Visualizing)
MATLAB Demo Random Variables & Distributions
Discuss Topics in Chap. 2-4 in [1], Chap. 1, 7 in [2].
Baye’s Theorem & The Envelope Problem,
Deductive, Inductive, and Abductive Reasoning.
Assign computational tutorial problems.
RANDOM VARIABLES
The number that faces up on an ‘unloaded’ dice rolled
on a flat surface is in the set { 1, 2, 3, 4, 5, 6 } and the
probability of each number is equal and hence = 1/6
After rolling a dice, the number is fixed to those who
know it but remains an unknown, or random variable
to those who do not know it. Even while it is still
rolling, a person with a laser sensor connected with a
sufficiently powerful computer may be able to predict
with some accuracy the number that will come up.
This happened and the Casino was not amused !
MATLAB PSEUDORANDOM VARIABLES
The MATLAB (software) function rand generates
decimal numbers d / 10000 that behaves as if d is a
random variable with values in the set {0,1,2,…,9999}
with equal probability. It is a pseudorandom variable.
It provides an approximation of a random variable x
with values in the interval [0,1] of real numbers such
that for all 0 < a < b < 1 the probability that x is in the
interval [a,b] equals b-a = length of [a,b]. These are
called uniformly distributed random variables.
PROBABILITY DISTRIBUTIONS
Random variables with values in a set of integers
are described by discrete distributions
Uniform (Dice), Prob(x = k) = 1/6 for k = 1,…,6
Binomial Prob(x = k) = a^k (1-a)^(n-k) n!/(n-k)!k!
for k = 0,1,…,n where an event that has probability
a occurs k times out of a maximum of n times and
k! = 1*2…*(k-1)*k is called k factorial.
Poisson Prob(x = k) = a^k exp(-a) / k! for k > -1
where k is the event that k-atoms of radium decay if
a is the average number of atoms expected to decay.
PROBABILITY DISTRIBUTIONS
Random variables with values in a set of real
numbers are described by continuous distributions
Uniform over the interval [0,1]
b
Prob( x [a, b])   1dx  b  a for 0  a  b  1
a
Gaussian or Normal
b


Prob( x [a, b])  
exp 2 2 dx
here   mean
2
and   standard deviation,   variance
1
a  2
( x   ) 2
MATLAB HELP COMMAND
>> help rand
RAND Uniformly distributed random numbers.
RAND(N) is an N-by-N matrix with random entries, chosen
from a uniform distribution on the interval (0.0,1.0).
RAND(M,N) is a M-by-N matrix with random entries.
>> help hist
HIST Histogram.
N = HIST(Y) bins the elements of Y into 10 equally spaced
containers and returns the number of elements in each
container. If Y is a matrix, HIST works down the
columns.
N = HIST(Y,M), where M is a scalar, uses M bins.
MATLAB DEMONSTRATION 1
14
16
14
12
12
10
10
8
8
6
6
4
4
2
0
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
16
15
14
12
10
10
8
6
5
4
2
0
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2
0.3
0.4
0.5
0.6
0.7
Why do these histograms look different ?
0.8
0.9
1
MATLAB DEMONSTRATION 2
>> x = rand(10000,1);
>> hist(x,41)
300
250
200
150
100
50
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MORE MATLAB HELP COMMANDS
>> help randn
RANDN Normally distributed random numbers.
RANDN(N) is an N-by-N matrix with random entries,
chosen from a normal distribution with mean zero,
variance one and standard deviation one.
RANDN(M,N) is a M-by-N matrix with random entries.
>> help sum
SUM Sum of elements.
For vectors, SUM(X) is the sum of the elements of X.
For matrices, SUM(X) is a row vector with the sum over
each column.
3 1
sum 
 7 6

4 5
MATLAB DEMONSTRATION 3
>> s = -4:.001:4;
>> plot(s,exp(s.^2/2)/(sqrt(2*pi)))
>> grid
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-4
-3
-2
-1
0
1
2
3
4
MATLAB DEMONSTRATION 3
>> x = randn(10000,1);
>> hist(x,41)
800
700
600
500
400
300
200
100
0
-5
-4
-3
-2
-1
0
1
2
3
4
MATLAB DEMONSTRATION 3
>> x = rand(5000,10000);
>> y = sum(x);
>> hist(y,41)
800
700
600
500
400
300
200
100
0
2420
2440
2460
2480
2500
2520
2540
2560
2580
CENTRAL LIMIT THEOREM
The sum of N real-valued random variables
y = x(1) + x(2) + … + x(N) will be a random
variable. If the x(j) are independent and have the
same distribution then as N increases the
distributions of y will approach (means gets
closer and closer to) a Gaussian distribution.
The mean of this Gaussian distribution
= N times the (common) mean of the x(j)
The variance of this Gaussian distribution
= N times the (common) variance of the x(j)
CONDITIONAL PROBABILITY
Recall that on my dice the ‘numbers’ 1 and 4
are red and the numbers 2, 3, 5, 6 are blue.
I roll one dice without letting you see how it rolls.
What is the probability that I rolled a 4 ?
I repeat the procedure BUT tell you that the number
is red. What is the probability that I rolled a 4 ?
This probability is called the conditional probability
that x = 4 given that x is red (i.e. x in {1,4})
Prob( A | B)  Prob of event A given event B
CONDITIONAL PROBABILITY
If A and B are two events then A  B denotes the
event that BOTH event A and event B happen.
Common sense implies the following LAW:
Prob ( A  B)  Prob ( B)  Prob ( A | B)
Example Consider the roll of a dice. Let A be the
event x = 4 and let B be the event x is red (= 1 or 4)
Prob( A  B)  Pr ob( A)  1 / 6
Pr ob( B)  1 / 3, Prob( A | B)  1 / 2
Question What does the LAW say here ?
BAYE’s THEOREM
http://en.wikipedia.org/wiki/Bayes'_theorem
for an event A, A
c
denotes the event not A
Question Why does Prob(A)  Prob(A )  1 ?
c
Prob(A) and Prob(B) are called marginal distributions.
Question Why does
Prob( B)  Prob(B | A)  Prob(A)  Prob(B | A c )  Prob(A c )
Question Why does
Prob(B | A)  Prob(A)
Prob( B) 
Prob(B)
INDUCTIVE & ABDUCTIVE REASONING
http://en.wikipedia.org/wiki/Inductive_reasoning
Inductive reasoning is the process of reasoning in which the premises of an
argument support the conclusion but do not ensure it.
This is in contrast to Deductive reasoning in which the conclusion is necessitated
by, or reached from, previously known facts.
http://en.wikipedia.org/wiki/Abductive_reasoning
Abductive reasoning, is the process of reasoning to the best explanations.
In other words, it is the reasoning process that starts from a set of facts and
derives their most likely explanations.
The philosopher Charles Peirce introduced abduction into modern logic.
In his works before 1900, he mostly uses the term to mean the use of a known rule to
explain an observation, e.g., “if it rains the grass is wet” is a known rule used to explain
that the grass is wet. He later used the term to mean creating new rules to explain new
observations, emphasizing that abduction is the only logical process that actually
creates anything new. Namely, he described the process of science as a combination
of abduction, deduction and implication, stressing that new knowledge is only created
by abduction.
EXPERIMENTS
http://www.holah.karoo.net/experimental_method.htm
Carnap p. 41 [1] “One of the great distinguishing
features of modern science, as compared to the science
of earlier periods, is its emphasis on what is called the
“experimental method”. “
Question How does the experimental method differ
from the method of observation ?
Question What fields favor the experimental methods
and what fields do not and why ?
Ideal Gas Law - one of the greatest experiments !
TUTORIAL QUESTIONS
Question 1. The uniform distribution on [0,1] has mean ½
and variance 1/12. Use the Central Limit Theorem to compute
the mean and variance of the random variable y whose
histogram is shown in vufoil # 13.
Question 2. I roll a dice to get a random variable x in
{1,2,3,4,5,6}, then put x dollars in one envelope and put 2x in
another envelope then flip a coin to decide which envelope to
give you (so that you receive the smaller or larger amount with
equal probability). Use Baye’s Theorem to compute the
probability that you received the smaller amount
CONDITIONED on YOUR FINDING THAT YOU
HAVE 1,2,3,4,5,6,8,10,12 dollars. Then use these
conditional probabilities to explain the Envelope Paradox.
SOURCE OF LECTURE VUFOILS
USC2170 Lecture 4:
Hypothesis Testing
PLAN FOR LECTURE
1. Populations and Samples
2. Sample Population Statistics
3. Statistical Hypothesis
4. Test Statistics for Gaussian Hypotheses
Sample Mean for Parameter Estimation
z-Test and t-Test Statistics
Rejection/Critical Region for z-Test Statistic
Hypothesis Test for Mean Height
5. General Hypotheses Tests
Type I and Type II Errrors
Null and Alternative Hypotheses
6. Assign Tutorial Problems
POPULATIONS AND SAMPLES
Population - a specified collection of quantities: e.g.
heights of males in a country, glucose levels of a
collection of blood samples, batch yields of an
industrial compound for a chemical plant over a
specified time with and without the use of a catalyst
Sample Population – a population from which samples
are taken to be used for statistical inference
Sample - the subset of the sample population
consisting of the samples that are taken.
SAMPLE POPULATION PARAMETERS
Sample
X 1 ,..., X n
Sample Parameters
Sample Size
Sample Mean
n
n  [ X 1  X 2    X n ] / n
Sample Variance
  [( X 1   n )    ( X n   n ) ] / n
2
n
2
2



Sample Standard Deviation
n
2
SAMPLE POPULATION PARAMETERS
Theorem 1 The variance of a population is related
to its mean and average squared values by
  [X  X  X ]/ n  
2
n
2
1
2
2
2
n
Proof Since ( X k   n )  X  2  n X k  
2
2
k
2
n
n  ( X 1   n )    ( X n   n ) 
2
2
2
X 1   X n  2 n ( X 1    X n )  n n 
2
2
2
2
X 1   X n  2n n  n n  Why ?
2
2
2
X 1  X n  n n
2
n
2
Question How can the proof be completed ?
2
2
n
STATISTICAL HYPOTHESES
are assertions about a population that describe
some statistical properties of the population.
Typically, statistical hypotheses assert that a
population consists of independent samples of a
random variable that has a certain type of distribution
and some of the parameters that describe this
distribution may be specified.
For Gaussian distributions there are four possibilities:
Neither the mean nor the variance is specified.
Only the variance is specified.
Only the mean is specified.
Both the mean and the variance are specified.
TEST STATISTICS
for Hypothesis with Gaussian Distributions
The sample mean for
unknown,
known


n  [ X 1  X 2    X n ] / n
is Gaussian with mean 0 and variance 1/n.
Proof (Outline) We let < Y > denote the mean of a
random variable Y. Then clearly
 n   [ X 1    X 2     X n ] / n  
Independence and Theorem 1 gives variance (  n ) 
 (  n   )    [( X 1   )    ( X n   )]  / n 
1
n
n
2

(
X


)(
X


)



/ n.
i
j
2 i 1  j 1
n
2
2
2
PARAMETER ESTIMATION
for Hypothesis with Gaussian Distributions
The sample mean for
unknown,
known


n  [ X 1  X 2    X n ] / n

since the estimate error  n     n
is unbiased     0
n
can be used to estimate the mean
and converges in the statistical sense that
standard deviation  n   / n  0
MORE TEST STATISTICS
for Hypothesis with Gaussian Distributions
 ,  known
zn  (n   ) /( / n )
The One Sample z-Test for
is a Gaussian random variable with mean 0,variance 1.
 known,  unknown
tn  (n   ) /( n / n )
The One Sample t-Test for
is a t-distributed random variable with
n-1 degrees of freedom.
z-TEST STATISTIC ALPHAS
z
0
0.2000
0.4000
0.6000
0.8000
1.0000
1.2000
1.4000
1.6000
1.8000
2.0000
2.2000
2.4000
2.6000
2.8000
3.0000
 (z )
0.5000
0.4207
0.3446
0.2743
0.2119
0.1587
0.1151
0.0808
0.0548
0.0359
0.0228
0.0139
0.0082
0.0047
0.0026
0.0013

z ( )
0.0500
0.0400
0.0300
0.0200
0.0100
0.0050
0.0040
0.0030
0.0020
0.0010
0.0005
0.0004
0.0003
0.0002
0.0001
0.0001
1.6449
1.7507
1.8808
2.0537
2.3263
2.5758
2.6521
2.7478
2.8782
3.0902
3.2905
3.3528
3.4316
3.5401
3.7190
3.8906

   p ( x) d x
z
p ( x) 
e
 x2 / 2
2
CRITICAL REGION FOR alpha=0.05
HEIGHT HISTOGRAMS
HYPOTHESIS TEST FOR MEAN HEIGHT
You suspect that the height of males in a country has
increased due to diet or a Martian conspiracy, you aim
to support your Alternative Hypothesis by testing the
  6.509 cm
20  177.115 cm
Null Hypothesis   174.204 cm
You compute a sample mean
using 20 samples then compute
z20  (20  174.204 cm) / (1.4555 cm)  2.000
If the Null Hypothesis is true the probability that
z20  2.000 is   (2)  .0228
Question Should the Null Hypothesis be rejected ?
GENERAL HYPOTHESES TESTS
involve
Type I Error: prob rejecting null hypothesis
if its true, also called the significance level


Type II Error: prob failing to reject null hypothesis
if its false, 1   also called the power of a test,
requires an Alternative Hypothesis that determines
the distribution of the test statistic.
and more complicated test statistics, such as the
One Sample t-Test statistic, whose distribution is
determined even though the distributions of the
Gaussian random samples, used to compute it, is not.
TUTORIAL QUESTIONS
1. Compute the power of a hypothesis test whose null
hypothesis is that in vufoil #13, the alternative
hypothesis asserts that heights are normally distributed
with mean    3.386 cm standard deviation  
where  and  are the same as for the null hypothesis
and 20 samples are used and the significance   .05
Suggestion: if the alternative hypothesis is true, what
is the distribution of test statistic z20  (20   ) /( / 20 )
What is the probability that z 20  z ( ) ?
2. Use a t-statistic table to describe how to test the
null hypothesis that heights are normal with mean 
and unknown variance based on 20 samples.
EXTRA TOPIC: CONFIDENCE INTERVALS
Given a sample mean  n for large n we can assume,
by the central limit theorem that it is Gaussian with
mean   mean of the original population and
2
1
1


variance n
n  variance of the original population.
2
2
Furthermore,    n  sample variance and if the
population is {0,1}-valued  2   (1   )
b
We say that   [ a, b] with confidence c  p( x)dx

a
where p(x) is the probability density of a Gaussian
with mean  n and standard deviation  / n   n / n
Theorem If  is a random variable unif. on [-L,L]
then Bayes Theorem  c  lim p(  | n )
L 
EXTRA TOPIC: TWO SAMPLE TESTS
A null hypothesis may assert a that two populations
have the same means, a special case for {0,1}-valued
populations asserts equalily of population proportions.
Under these assumptions and if the variances of both
populations are known, hypothesis testing uses the
Two-Sample z-Test Statistic z 
~
(  n   n~ )
2
n

~ 2
n~
where  n ,  , n is the sample mean, variance, and
sample size for one population, tilde’s for the other.
For unkown variances and other cases consult:
2
http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
EXTRA TOPIC: CHI-SQUARED TESTS
are used to determine goodness-or-fit for various
distributions. They employ test statistics of the form
d
2
  i 1 (obsi  exp i ) 2 / exp i where obs i are independent
observations & null hyp.  expected value obs i  exp i
and chi-squared distrib. with d-1 degrees of freedom.
Example [1,p.216] A geneticist claims that four species of
fruit flies should appear in the ratio 1:3:3:9. Suppose that the
sample of 4000 flies contained 226, 764, 733, and 2277 flies
of each species, respectively. For alpha = .1, is there
sufficient evidence to reject the geneticist’s claim ?
Answer: The expected values are 250, 750, 750, 2250
2
2
hence   (226  250) / 250    3.27
2
NO since 3 deg. freed. & alpha = .1  6.251
EXTRA TOPIC: POISSON APPROXIMATION
n!
k
nk
The Binomial Distribution B(k ) 
a (1  a)
k!(n  k )!
is the probability that k-events happen in n-trials if
a  prob. that an event happens in one trial
It has mean   na and variance  2  na(1  a)
If a  1 and k  n then
B(0)  (1  n ) n  e  
The right side is the
n!
B(1) 
a (1  n ) n   e   Poisson Distribution
(n  1)!
k
n!

 nk
k

B( k ) 
a (1  n ) 
e
k!(n  k )!
k!
REFERENCES
1. Martin Sternstein, Statistics, Barrows College
Review Series, New York, 1996. Survey textbook
covers probability distributions, hypotheses tests,
populations,samples, chi-squared analysis, regression.
2. E. L. Lehmann, Testing Statistical Hypotheses,
New York, 1959. Detailed development of the
Neyman-Pearson theory of hypotheses testing.
3. J.Neyman and E.S. Pearson, Joint Statistical Papers,
Cambridge University Press, 1967. Source materials.
4. Jan von Plato, Creating Modern Probability,
Cambridge University Press, 1994. Charts the history
and development of modern probability theory.