APStat – Notes CH 1 - Woodside Priory School

Download Report

Transcript APStat – Notes CH 1 - Woodside Priory School

APSTAT
UNIT 4A
INFERENCE PART 1
APSTAT
Chapter 18
Sampling Distributions
and Sample Means
Lets Just DO IT!!!!
Proportion of correct answers on last AP Stat Exam
1
2
3
4
5
6
7
8
9
10
11
12 13
.60 .65 .55 .70 .85 .80 .75 .70 .65 .75 .80 .85 .75
Regular Old Distribution:
.55-.59 .60-.64 .65-.69 .70-.74 .75-.79 .80-.84 .85-.89
Lets Just DO IT!!!!
Proportion of correct answers on last AP Stat Exam
1
2
3
4
5
6
7
8
9
10
11
12 13
.60 .65 .55 .70 .85 .80 .75 .70 .65 .75 .80 .85 .75
• Now, Everyone take a 5-person random sample
• Do randint(1,13,5) to choose your subjects
• Add their scores and divide by 5 to get x-bar
(sample mean)
• Now we will do a distribution of our sample means
– a SAMPLING DISTRIBUTION!!!!!
Lets Just DO IT!!!!
Proportion of correct answers on last AP Stat Exam
1
2
3
4
5
6
7
8
9
10
11
12 13
.60 .65 .55 .70 .85 .80 .75 .70 .65 .75 .80 .85 .75
Class Sample Means:
Sampling Distribution
.55-.59 .60-.64 .65-.69 .70-.74 .75-.79 .80-.84 .85-.89
We Just DID IT!!!!
Give me at least 2 things that are different between
the regular distribution and the sampling
distributions:
1.
2.
3.
4.
5.
Bias

Unbiased Statistic


Mean of Sampling distribution should
equal True population mean.
How did ours look earlier? The true
mean of the population was about
72.3….
Sample Proportions


Mean of Sample Proportion
In last section:
  np

Proportion is just outcome divided by n,
so….
np
 pˆ 
 pˆ  p
n
p
Standard Deviation of
Sample Proportion

In last section:
 X  np(1  p )

Proportion is just outcome divided by n,
so throw down a little Algebra….
np(1  p )
np(1  p )
np(1  p )
 pˆ 


2
2
n
n
n
 pˆ 
p(1  p )
n
Now try it…Sample
Proportions

Find mean and standard deviation for:

60 samples of 10 coin flips, p=.5

60 samples of 50 coin flips, p=.5

What does this say about variability in
regards to sample size?
2 Rules of Thumb Assumptions/Conditions

Population size large enough



Population should be at least 10 times the
sample size
10% Condition
Normalness


n should be large enough to produce an
approximately normal sampling distribution.
np > 10 AND n(1-p) >10
Try them out

A San Jose firm decide to sample 25
residents to determine if they oppose
off-shore oil drilling. They predict that
P(oppose) = 0.4

Large enough population?

Normalness?
Example….

If the true percentage of students who
pass the APStat exam is .64, what is
the probability that a random sample
of 100 students will have at least 70
students pass?
=.64, n=100, 70 or more

Check Conditions - Briefly Explain
10%
 np and n(1-p) > 10


Draw Picture- (Find SD too)

Find P-Value

Conclusion
Same Problem Data…

Within what range would we expect to
find 95% of sample proportions of size
100.
Sample Means
Take a whole bunch of samples and
find the means
 Why sample means?
 Remember our sample of class
scores?

Less variable
 More normal

Mean and Standard
Deviation of X-BAR


If we take all the possible samples from a
population, the mean of the sampling
distribution will equal the population
mean (if the population mean was
accurate in the first place, but more on
that later)
x  X
Mean and Standard
Deviation of X-BAR


Standard Deviation of a sampling
distribution is:
2

s 
n
2
s  
2
s
n
2
n
Let’s try it!

If adult males have height N(68,2)
what would be the mean and standard
deviation for the distribution if:
n=10
 n=40

What happened to the Standard
deviation when n was quadrupled?
 What would happen to the standard
deviation if n was multiplied by 9?

CLT – The Central Limit
Theorem


If the population we are sampling from is
already normal with N(,), the sampling
distribution will be normal as well with
mean  and standard deviation  n
But what if the population we are
sampling from is not normal?
Age of Pennies

Riebhoff has 50 pennies, he took the
current year and subtracted it from the
date on the penny to obtain the following
data…
Penny Ages
1
0
11
11
21
2
31
2
41
3
2
1
12
1
22
8
32
1
42
28
3
2
13
12
23
9
33
7
43
7
4
4
14
2
24
5
34
3
44
4
5
14
15
5
25
15
35
17
45
7
6
3
16
5
26
6
36
1
46
4
7
3
17
6
27
0
37
19
47
0
8
7
18
0
28
4
38
4
48
0
9
8
19
1
29
1
39
23
49
10
10
0
20
0
30
1
40
2
50
5
Sample Size n=1
7
6
5
4
# 0f Pennies
3
2
1
0
0
3
6
9
12
15
18
Sample Size n=4
(everyone do 3 SRS)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Sample Size n=8
(everyone do 3 SRS)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
What happened?



The distribution got “normaler” as the
sample size increased. Cool?
Central Limit Theorem says that even if a
distribution is not normal, the distribution of
the sampling distribution will approach
normalcy when n is large.
Allows us to use z-scores and such, even
when the larger population is not normally
distributed.
Assumptions/Conditions



Random Sample - Always describe
Independence - Describe
10% Condition
Try it

If the APSTAT EXAM 2005 had a
mean score of 3.2 with a standard
deviation of 1.2…
Old Skool - Find the probability that a
single student would have a score of 4
or higher?
 New Skool – find the probability that
an SRS of 20 students would have a
score of 4 or higher?

Old Skool - Find the probability that a
single student would have a score of 4
or higher?
New Skool – find the probability that
an SRS of 20 students would have a
score of 4 or higher?

Check Conditions - Briefly Explain
10%
 Independence and Random Sample


Draw Picture- (Find SD too)

Find P-Value

Conclusion
Standard Error

Sometimes we do not have the
population standard deviation. If we
have to estimate it, we call it Standard
Error and roll an SE.
APSTAT
Chapter 19
Confidence Intervals for
Proportions
Confidence Interval for a Proportion
(aka One-Proportion Z-Interval)
At Woodside High, 80 students are
surveyed and 32% of them had tried
marijuana.
 How confident am I that the true
proportion of WH students that have
tried marijuana is at or near 32%?
 CONFIDENCE INTERVAL!!!

The Dealio…

If I do know the population mean
• If I sample, I know the sample mean might be quite
different than the population mean


BUT…That difference is predictable.
For instance, if N(0.70,0.1) and n=4



Sample Mean = 0.70, Sample SD=0.1/sqrt4=.05
We expect 95% of samples (Empirical Rule) to
fall between 2 SD of the mean
Therefore 95% of samples will fall between 0.6
and 0.8.
Confidence Intervals
Work in reverse
 (From Woodside High Example) I
sampled 80 and got sample p = .32
 I want to know the true population
proportion.
 The true population proportion will lie
within 2SD of the Sample Proportion
in 95/100 samples of this size.
 Let’s Do It!!!!

Do It!

List what you know


p-hat=.32, n=80
Conditions/Assumptions

10% for Independence
• Woodside HS has over 800 students

np and nq > 10 to use Normal Model
• Both .32 x 80 and .68 x 80 > 10

Find Standard Error

SE(p-hat)=
Do It!

Draw the Picture

Conclusion:

We are 95% confident that the TRUE
mean proportion of ____________
falls between ____ and ____
Do It!

We can also write confidence intervals
in the form:
(estimate) ± (margin of error)
Standard
error
pˆ (1  pˆ )
pˆ  z *
n
What Does 95% Confidence
Mean?
If we did a whole bunch of confidence
intervals at this sample size, we would
expect 95 out of 100 intervals to
contain the true mean.
 Picture of this:

TRUE POPULATION
PROPORTION
AHOY!



We do not always want 95% confidence.
Example, if a part on an airplane’s landing
gear needs to be a certain size to work,
wouldn’t you want a little more confidence in
the sample being within certain
parameters?
Common Intervals are 90, 95 and 99%

Denote as C=.90, C=.95, or C=.99
But 90 and 99% aren’t
Empirically Cool
Area = 90%
p


We need this z-score! It’s critical!
So critical, it is called the critical value and denoted as z*
Mas z*
Now check t distribution critical values
chart (back of book or formula sheet)
 Look at bottom. It gives you C and
right above it is…..
 Yeah!

Try it!

A poll asked who would you vote for if
an election were held today between
Sen. Barack Obama and Sen. John
McCain. 115 of the 250 respondents
chose Sen. McCain. Construct and
interpret a 90% confidence interval for
the proportion of voters choosing
McCain.
Try It!

Conditions:

Mean, SE, z*

Calculate CI

Conclusion
Last thing
Finding sample size needed for a CI
with a given level of confidence and a
given margin of error
 NBC News is doing a poll on who will
be the next Governor of California.
The want a 3% margin of error at a
95% confidence interval. What
sample size should they use?

Sample size needed
pˆ (1  pˆ )
pˆ  z *
n
Margin of Error
Sample size needed
Why 0.5? Gives us largest n value. Safety First!
z*
pˆ (1  pˆ )
 .03
n
0.5(1  0.5)
1.960
 .03
n
.25
1.960
 .03
n
.25
1.960
 .03
n
1.960 0.5
 n
.03
 32.67 
2
n
1067  n
OOPS! YOU SHOULD ALWAYS ROUND
UP TO STAY WITHIN CONFIDENCE
INTERVAL! SHOULD BE 1068.
APSTAT
Chapter 20
One Proportion
Hypothesis Tests
Significance Tests

Example. AP Stat Exam 2005:



National Proportion Who Passed = .58
Priory Students n = 32, p-hat=.78
Two Possibilities


Higher WPS proportion just happened by
chance (natural variation of a sample)
The likelihood of 78% of 32 students passing
is so remote we must conclude that Priory
Students are likely better at APStat than
national average.
Hypothesis Testing

Reflect our two possibilities from above:



NOTHING IS STRANGE (difference could
have been by natural variation of sample)
SOMETHIN’ IS GOIN’ ON (difference is so
improbable we must assume there is a
difference)
Here is how we write them:


H0: Null Hypothesis (Nothing Strange)
Ha: Alternative Hypothesis (Somethin’ is goin’
on)
In our WPS SAT Example

In practice, we describe the hypotheses in
both symbols and words



H0: p = .58, Priory students perform at the
same level as the National Average
Ha: p > .58, Priory students perform better
than the National Average
We will perform test(s) that give evidence
against the H0 (kinda like a trial)
What to do with the
Hypothesis…

After we conduct a test we will have
evidence based on our understanding of
probability and sample variation. With
this info we can:


Reject H0 in favor of Ha
• if there is SIGNIFICANT evidence that the
result did not likely happen by chance
variation.
Fail to Reject H0
• if there is not enough evidence to reject it.
The variation could likely have happened
by chance
Be Carefull…

Notice we NEVER, NEVER, NEVER
Accept either Hypothesis
 Say one or the other is true or false

We only have evidence, we could still
be wrong….
 BUT….the stronger the evidence the
more confident we can be!

Where do we get evidence?


One way, P-value from a z-score. What is
the probability that this event happened
given the population mean, standard
deviation and # in our trial?
Our old friend, the z-score: We are using a
z
pˆ  p
p(1  p )
n
sample here, so we
throw in our sample
standard deviation.
Let’s Do It! WPS SAT
Example

Step 1 Define Parameter:


p the true passing proportion of WPS APstat
test-takers
Step 2 Hypotheses


H0: p= .58, Priory students perform at the
same level as the national proportion
Ha: p> .78, Priory students perform better
than the national proportion
WPS SAT Example
Continued

Step 3 Assumptions:


SRS
• No, but we will assume WPS Students
are a representative sample of the
population of all AP Stat test-takers.
Independence
• Priory sample of 32 is less than 10%
of population of AP Stat test-takers

• .58(32) and .42(32) both > 10
WPS SAT Example
Continued

Step 4 Name Test and DO IT

z
One Sample Z-Test for a Proportion
pˆ  p
p(1  p )
n
WPS SAT Example
Continued

Step 5

P-value and sketch of normal curve:
P(z> 2.31)= .01053
.58

.78
I
Step 6 nterpret P-value and Conclusion

A P-Value of .01053 indicates that there is about a
1 in 100 chance that a result this distant from the p
happened merely by chance. Therefore, reject H0
in favor of Ha. It is very likely that WPS students
performed far better on average than the National
Average on the 2005 APStat exam
PHAT-PI (MUCH LOVE TO AL YOUNG)
P - Parameters (What are we studying)
 H - Hypothesis (In words and symbols)
 A - Assumptions (depends on type of test)
 T - Test (Name it. Do it.)
 P - P-Value (Calculate it-Draw it)
 I - Interpret (Reject/Fail to Reject, Why, ATQ)

Alternative Hypotheses


Can be:
Greater Than (Ha
------>blah)
0

Less Than (Ha
--------<blah)
0

Not (Ha --------≠blah)
Double your one-sided P-value
0
On TI-83
Still have to do all of Phat Pi, but
helps with calculations.
 Stat>Test>1-PropZTest

p0 - Population proportion
 x - successes in sample n(p-hat)
 n - sample size


Do it for AP Stat Example
Defective Products

A company claims that just 3% of its
products are defective. A simple
random sample of 400 of their
products yielded 14 defective items.
Do these sample data suggest that
the company’s claim is too low?
PHAT-PI

P

H

A
PHAT-PI

T

P

I
How Much Evidence?
GTang (and many texts) give a rule of
thumb of 5%. If there is a 5%
probability or less that the outcome
would happen by chance, you can
throw down the “enough evidence to
reject H0…”
 If it is 1% or less, you can throw down
the “very strong evidence against H0.
Reject H0 in favor…”

Significance level…




Sometimes a problem will specify a certain
amount of evidence that is needed.
 = Significance Level
Usually  = 0.05 or 0.01
Basically, your P-value must be below that
level to reject the null hypothesis.


Example your p-value is .03 and  = 0.05
Be careful with one and two-sided
alternatives and significance levels

Your p-value doubles in a 2-sided.
APSTAT
Chapter 21
More Stuff About
Hypothesis Tests
Great Chapter
Make sure you read it
 Important concepts:

What a Null Hypothesis is and isn’t
 What P-Value Means
 Significance () Level
 Critical Value - One v. Two sided
 Confidence Intervals and Tests of
Significance - Relationship Between

Great Chapter…But….

Goes further than you need in
explaining:

Types of Error
• Type I
• Type II

Power
Errors - Can We Make
Mistakes?

Sure, Rejecting a “Good” Shipment


We Can also Fail to Reject a “Bad” Statement


For Example, I need batteries that work 99% of
the time. My significance test of a sample from a
battery shipment tells me to “reject” the shipment,
but it is actually ok.
If I had accepted a shipment that was actually bad
because my sample proportion ended up close to
the mean I was looking for.
Which of these is worse in real life?
Errors

Type I – Reject H0 when it is actually true




Usually not so bad
Rejecting a “good” shipment
Probability is equal to 
Type II – Failing to Reject H0 when it is actually false



Usually bad
Accepting a “bad” shipment
Probability () is a bear to calculate
• Check book to see how! Ooooo, fun!
• Be happy you will NEVER be asked to do it
Errors - #2

Decrease both Type I and II errors by:


Increasing n
Decrease Type II Errors by:

Increasing 
• You end up rejecting more/failing to reject
less
• Causes an increase in Type I errors
POWER





Basically, how sure we are that we will not
get a Type II error
Power = 1 – P(Type II)
OR Power = 1 - P()
Never will you be asked to compute (unless
the probability of a type II error is given)
Increase Power by:


Increasing n (Sample size)
Increase  (say from .01 to .05)
Power and Error Wrap

What you have to know:
Explain Power, Type I, and Type II
errors in context of the problem.
 Calculate P(Type I error) given 
 How to Decrease:

• Type I Error
• Type II Error

How to increase Power
APSTAT
Chapter 22
Two Proportion
Hypothesis Tests
Let’s Hop Right In…

A recent report found that men wash
their hands 75% of the time after
using the restroom and women 85%
of the time. If SRS’s of 1200 men and
1100 women were surveyed, can we
statistically say there is a significant
difference between hand washing
habits of men and women?
Handwashing

Parameter (group 1=female, 2=male)


1-p2:
Difference between female and male
hand washing proportions
Hypotheses
H0: p1-p2=0 No difference in hand washing
 Ha: p1-p2≠0 Is a
“
“ “
“

Handwashing

Assumptions
 SRS’s
Yep
 Independent samples Safe to Assume
 n1p1>5 and n1(1-p1)>5
 n2p2>5 and n2(1-p2)>5
Yep
 Population 10X Sample
Yep
Handwashing

z
Test – Two Sample Proportion Z-Test
( pˆ1  pˆ 2 )
1 1
pˆ (1  pˆ )   
 n1 n2 

POOLED
.85  .75
1 
 1
.7978(1  .7978) 


 1100 1200 
pˆ 
x1  x2
935  900

 .7978
n1  n2 1100  1200
Pool if variances are equal (since our null
theorizes that the populations – and thus
the variances - are equal)
 5.965
Handwashing

P-Value


2*P(Z>5.965)=Really Really Really Small
Interpretation

P Value is so small, there is VERY
significant evidence against the
assumption that males and females wash
hands at the same proportion. Reject Null
Hypothesis in favor of the Alternative.
Males and females almost assuredly have
different hand washing proportions.
Pooled vs. Non-Pooled
•Use Pooled when you hypothesize
populations have the same variance (in
proportions, the same p = same variance)
• Use Non-pooled when populations are
likely to have separate variances. (If your
null shows a non-zero difference)
z
( pˆ1  pˆ 2 )   p1  p2 
pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )

n1
n2
Confidence Interval
pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )

 pˆ1  pˆ 2   z *
n1
n2
Use Non-Pooled because there is no null to test for.
So, To Review…

PhatPi is on, but with these changes:






P Parameter of interest is now the
difference between ___ and ___
H H0: p1=p2 (or p1-p2=0)
Ha: p1>p2 (or p1-p2>0)
or Ha: p1<p2 (or p1-p2<0)
or Ha: p1≠p2 (or p1-p2 ≠ 0)
Plus, you have to choose Pooled v. NonPooled (Pooled if Null is p1=p2)
Using TI 83
Stat>Test>2-PropZTest
 Can Also Do Interval:


Stat>Test>2-PropZInt
• Put in C-Level (usually .9, .95, or .99)
Let’s do one!

Some scientist suggest that sickle-cell traits
protect against malaria. A study in Africa
tested 543 for sickle-cell trait and also for
malaria. In all, 136 of the children had
sickle-cell trait and 36 of these had malaria.
The other 407 children lacked the sickle-cell
trait and 157 of them had malaria. Is there
evidence that malaria infection is lower
among children with the sickle-cell trait.
Malaria v. Sickle Cell

P

H

A
Malaria v. Sickle Cell

T

P

I
Do Using a 95% C-Interval

Assumptions:

Interval Calculation

Interpretation
That is it!

Just one section left to go!