Transcript lecture3
Hypothesis testing and
parameter estimation
Bhuvan Urgaonkar
“Empirical methods in AI” by P. Cohen
System behavior in
unknown situations
Self-tuning systems ought to behave properly in
situations not previously encountered
How to quantify the goodness of a system in
dealing with unknown situations?
Statistical inference is one way
2
Statistical inference
Process of drawing inference about an unseen
population given a relatively small sample
Populations and samples
Statistics: Functions on samples
Parameters: Functions on populations
3
Examples
Example 1: Toss a fair coin
– Parameter: number of heads in 10 tosses
– Can be determined analytically
Example 2: Two chess programs A and B play 15
games, A wins 10, draws 2, loses 3.
– Parameter: probability that A wins
– Population of all possible chess games too large to
enumerate => we cannot know the exact value
• Can estimate pwin as p=0.67
• p is a statistic derived from the above sample
4
Two kinds of statistical inference
Hypothesis testing: Answer a yes-or-no question
about a population and assess the probability
that the answer is wrong
– Assume pwin=0.5 and assess the probability of the
sample result p=0.67
– If this is very small, A and B are not equal
Parameter estimation: Estimate the true value of
a parameter given a statistic
– If p=0.67, what is the “best” estimate of pwin
– How wide an interval to draw around p to be
confident that pwin falls within it?
5
Two kinds of statistical inference
Hypothesis testing: Answer a yes-or-no question
about a population and assess the probability
that the answer is wrong
– Assume pwin=0.5 and assess the probability of the
sample result p=0.67
– If this is very small, A and B are not equal
Parameter estimation: Estimate the true value of
a parameter given a statistic
– If p=0.67, what is the “best” estimate of pwin
– How wide an interval to draw around p to be
confident that pwin falls within it?
6
Hypothesis testing example
Two programs A and B that summarize news
stories
– Performance measured as recall, the proportion of
the important parts of a story that make it into the
summary
Suppose you run A every day for 120 days and
record mean recall scores of 10 stories
Then you run B and want to answer:
– Is B better than A?
7
Hypothesis testing steps
Formulate a null hypothesis
– mean(A) = mean(B)
Gather a sample of 10 news stories and run them
through B. Call the sample mean Emean(B)
Assuming the null hypothesis is right, estimate the
distribution of mean recall scores for all possible
samples of size 10 run through B
Calculate the probability of obtaining Emean(B) given
this distribution
If this probability is low, reject the null hypothesis
8
Hypothesis testing steps
Formulate a null hypothesis
– mean(A) = mean(B)
Gather a sample of 10 news stories and run them
through B. Call the sample mean Emean(B)
Assuming the null hypothesis is right, estimate the
distribution of mean recall scores for all possible
samples of size 10 run through B
Calculate the probability of obtaining Emean(B) given
this distribution
If this probability is low, reject the null hypothesis
9
Sampling distributions
Distribution of a statistic calculated from all possible
samples of a given size, drawn from a given
population
Example: Two tosses of a fair coin; sample statistic
be the number of heads
– Sampling distribution is discrete
– Elements are 0, 1, 2 with probabilities 0.25, 0.5, 0.25
How to get sampling distributions?
10
Exact sampling distributions
Coin tossed 20 times, num. heads=16
– Is the coin fair?
Sampling distribution of the proportion ph under
the null hypo that the coin is fair
Easy to calculate exact probabilities of all the
values for ph for N coin tosses
– Possible values: 0/N, 1/N, …, N/N
– Pr(ph=i/N) = N! * 0.5N / i! * (N-i)!
– Pr(ph=16/20) = 0.0046 --- next to impossible!
11
Estimated sampling distributions
Unlike the sampling distribution of the
proportion, that of the mean cannot be
calculated exactly.
– Recall the news story example
It can, however, be estimated due to a
remarkable theorem
12
Central limit theorem
The sampling distribution of the mean of
samples of size N approaches a normal
distribution as N increases.
– If samples are drawn from a population with mean M
and std. dev SD, then the mean of the sampling
distribution is M, its std. dev is SD/sqrt(N)
– This holds irrespective of the shape of the population
distribution!
13
The missing piece in
hypothesis testing
Null hypothesis
– mean(A) = mean(B)
We don’t know the distribution of mean(B), but
we do know the distribution of Emean(A)!
– CLT: Emean(A) = mean (A) = mean (B)
14
Computer-aided methods for
estimating sampling distributions
Use simulation to estimate the sampling
distribution
Monte Carlo tests
– If population distribution is known but not the
sampling distribution of the test statistic
– Derive samples from this known distribution
Bootstrap methods
– Population distribution is unknown
– Idea: Resample from the sample (treat the sample as
the population!)
15
Other related concepts/techniques
Hypotheses tests that work under different conditions
– Z-test, t-test (small values of N)
– Ref: Paul Cohen
Parameter estimation
–
–
–
–
Confidence intervals
Analysis of variance: interaction among variables
Contingency tables
Ref: Paul Cohen
Expectation maximization
– X: observed data, Z: unobserved, Let Y=X U Z
– Searches for h that maximizes E[ln P(Y | h)]
– Ref: “Machine Learning” by Tom Mitchell
16