2% - Project Maths

Download Report

Transcript 2% - Project Maths

Night 1
Modular Course 5
Summary or Descriptive Statistics:
Numerical and graphical
summaries of data.
INFERENTIAL STATISTICS:
USING THE SAMPLE
STATISTICS TO INFER (TO)
POPULATION PARAMETERS.
Making Decisions based on the Empirical
Rule (Standard Normal Curve)
68%
๏€ญ3
๏€ญ2
๏€ญ1
0
95%
99.7%
1
2
3
Empirical Rule
๐ = ๐’‘๐’๐’‘๐’–๐’๐’‚๐’•๐’Š๐’๐’ ๐’Ž๐’†๐’‚๐’
๐ˆ = ๐’‘๐’๐’‘๐’–๐’๐’‚๐’•๐’Š๐’๐’
๐’”๐’•๐’‚๐’๐’…๐’‚๐’“๐’… ๐’…๐’†๐’—๐’Š๐’‚๐’•๐’Š๐’๐’
68%
๏ญ ๏€ญ 3๏ณ
๏ญ ๏€ญ 2๏ณ
๏ญ ๏€ญ 1๏ณ
๏ญ
95%
99.7%
๏ญ ๏€ซ 1๏ณ
๏ญ ๏€ซ 2๏ณ
๏ญ ๏€ซ 3๏ณ
Most Important for Inferential Stats on our
Syllabus
๏ญ ๏€ญ 2๏ณ
๏ญ
๏ญ ๏€ซ 2๏ณ
95%
95% of normal data lies within 2 standard deviations of the
mean
Example 1
IQ scores are normally distributed with a mean of 100 and a standard deviation of 15.
Use the Empirical Rule to show that 95% of IQ scores in the population are between 70 and 130.
Solution
95% of the IQ scores are within ๏‚ฑ 2 standard deviations of the mean.
100 ๏€ซ 2(15) ๏€ฝ 100 ๏€ซ 30 ๏€ฝ 130
100 ๏€ญ 2(15) ๏€ฝ 100 ๏€ญ 30 ๏€ฝ 70
68%
๏€ญ3
๏€ญ2
๏€ญ1
0
95%
99.7%
1
2
3
Example 2
The number of sandwiches sold by a shop from 12 noon to 2 pm each day is normally distributed.
The mean of the distribution was 42.6 sandwiches and a standard deviation of 8.2.
Use the Empirical Rule to identify the range of values around the mean that includes 68%
of the sale numbers.
Solution
68% of the sales are within ๏‚ฑ 1 standard deviations of the mean .
42.6 ๏€ซ 1(8.2) ๏€ฝ 42.6 ๏€ซ 8.2 ๏€ฝ 50.8
42.6 ๏€ญ 1(8.2) ๏€ฝ 42.6 ๏€ญ 8.2 ๏€ฝ 34.4
Solution: 68% of the sale are between 34.4 and 50.8 sandwiches.
68%
๏€ญ3
๏€ญ2
๏€ญ1
0
95%
99.7%
1
2
3
Your Turn
Question
The table below shows the prices charged per room of 40 B&B houses in Galway.
Race - Week B&B prices per room (โ‚ฌ)
56
75
60
70
80
70
50
90
80
75
75
50
75
50
70
60
65
60
50
70
84
70
70
60
60
70
70
70
40
60
70
80
60
65
55
50
70
80
50
55
(i) Calculate, correct to one decimal place, the mean and standard deviation of the data.
(ii) Show that the emperical rule holds true for 1 standard deviation around the mean.
(iii) Show that the emperical rule holds true for 2 standard deviations around the mean.
68%
๏€ญ3
๏€ญ2
๏€ญ1
0
95%
99.7%
1
2
3
Solution
(i) Using calculator :
(ii)
Mean = 65.5, SD =11.2
Upper Range = Mean + 1(Standard Deviation) = 76.7
Lower Range = Mean - 1(Standard Deviation) = 54.4
Of the forty houses 13(68.05%) charge between โ‚ฌ54.40 and โ‚ฌ76.70
Therefore aprox 68% of the prices lie between 1 standard deviation of the mean.
(iii)
Upper Range = Mean + 2(Standard Deviation) = 87.9
Lower Range = Mean - 2(Standard Deviation) = 43.1
Of the forty houses 38 (95%) charge between โ‚ฌ43.10 and โ‚ฌ87.90
Therefore aprox 95% of the prices lie between 2 standard deviations of the mean.
68%
๏€ญ3
๏€ญ2
๏€ญ1
0
95%
99.7%
1
2
3
Inferential Statistics:
Sampling
For Leaving Cert we deal with two types of sampling:
1. Sample Proportion (Ordinary Level and Higher Level)
2. Sample Means ( Higher Level)
Sampling
We are usually unable to collect information about a total population.
The aim of sampling is to draw reasonable conclusions about a population
by obtaining information from a relatively small sample of that population.
When a sample from a population is selected we hope that the data we get
represents the population as a whole.
To ensure this
1. The sample must be random;
2. Every member of the population must have an equal chance of being
included;
Population
S ample 1
S ample 2
S ample 3
S ample 6
S ample 4
S ample 5
Population Proportions and Margin of Error
A sample of 25 students in a school were asked if they spent over โ‚ฌ5 on
mobile phone calls over the last week. 10 students have spent over โ‚ฌ5.
10
The proportion of the sample of 25 who spent over โ‚ฌ5 was 25 = 0.4 = 40%
Can we say that 40% of the students in the school (population) spent over
โ‚ฌ5?
The answer is no, (unless the sample size was the same as the population
size), we canโ€™t say for certain.
However we could say with a certain degree of confidence, if the
sample was large enough and representative then the proportion of
the sample would be approximately the same as the proportion of
the population
Population Proportions and Margin of Error
How confident we are is usually expressed as a percentage.
We already saw (from the empirical rule) that approximately 95% of the area
of a normal curve lies within ± 2 standard deviations of the mean.
This means that we are 95% certain that the population proportion is within ±2
standard deviations of the sample proportion. ± 2 standard deviations is our
margin of error and the percentage margin of error that this represents
depends on the sample size.
If n = 1000 the percentage margin of error of ± 3%
95% is the confidence interval we are working with, but other confidence
intervals also exist (e.g.90% and 99%) for which a different margin of error
applies depending on sample size.
At 95% level of confidence
1
Margin of Error ๏€ฝ
n
where n, is the sample size
Confidence interval for population
proportion using Margin of Error
๐Ÿ
๐‘ณ๐‘ช๐‘ถ๐‘ณ: ๐‘ด๐’‚๐’“๐’ˆ๐’Š๐’ ๐’๐’‡ ๐’†๐’“๐’“๐’๐’“ = ±
, ๐’˜๐’‰๐’†๐’“๐’† ๐’˜๐’‰๐’†๐’“๐’† ๐’ ๐’Š๐’” ๐’•๐’‰๐’† ๐’”๐’‚๐’Ž๐’‘๐’๐’† ๐’”๐’Š๐’›๐’†.
๐’
95% confidence interval
๐Ÿ
๐’‘โˆ’
๐’
Population
Proportion ๐’‘
๐Ÿ
๐’‘+
๐’
95% confident that the population proportion is inside this confidence
interval
Showing a 95% confidence interval.
Question. A sample of 25 students in a school were asked if they spent over
โ‚ฌ5 on mobile phone calls over the last week. 10 students has spent over โ‚ฌ5.
10
The proportion of the sample of 25 who spent over โ‚ฌ5 was 25 = 0.4.
Margin of Error =
20 different 95% confidence intervals
โˆ’
1
25
๐Ÿ
๐’
= 0.2
+
๐Ÿ
๐’
95% of the time, the true
population proportion is in
the interval I made with my
sampled proportion and
the margin of error interval.
Some Notes on Margin of Error
โ€ข As the sample size increases the margin of error decreases
โ€ข A sample of about 50 has a margin of error of about 14% at 95% level of
confidence
1
๏€ฝ ๏‚ฑ14.14%
50
โ€ข A sample of about 1000 has a margin of error of about 3% at 95% level of
confidence
1
๏€ฝ ๏‚ฑ3.16%
1000
โ€ข The size of the population does not matter
โ€ข If we double the sample size (1000 to 2000) we do not get do not half the
margin of error
โ€ข Margin of error estimates how accurately the results of a poll reflect the
โ€œtrueโ€ feelings of the population
๐Ÿ
๐’
Sample Size
25
64
Margin of Error
๏‚ฑ 20%
๏‚ฑ 12.5%
100
๏‚ฑ 10%
๏‚ฑ 6.25%
๏‚ฑ 5%
๏‚ฑ 4%
๏‚ฑ 3%
๏‚ฑ 2.5%
๏‚ฑ 2%
๏‚ฑ 1%
256
400
625
1111
1600
2500
10000
Example 1
A company claims that 30% of people who eat their "Rice Crispy Bun" product really liked it.
The confidence level is cited as 95%.
In June an independant survey was carried out on 625 randomly selected people to see if
they liked the "Rice Crispy Bun" product.
(i)
(ii)
Calculate the margin of error.
The result of the survey in June was that 125 liked the "Rice Crispy Bun" product.
According to the June survey would you say that at a 5% level of significance the
company was correct in stating that 30% of people who eat their "Rice Crispy Bun"
product really liked it?
Solution
1
1
(i)
Margin of Error ๏€ฝ
๏€ฝ
๏€ฝ ๏‚ฑ 0.04 ๏€ฝ ๏‚ฑ 4%
n
625
(ii)
Reason : The company claim 30% like the product.
The margin of error is plus or minus 0.04.
Acording to the survey 125 out of 625 liked the product.
30% is outside the margin of error.
30%
๐Ÿ
๐’‘โˆ’
16%
๐’
๐’‘
๐Ÿ
๐’‘ โˆ’ 24%
๐’
Example 2
In a survey I want a margin of error of ๏€ซ or ๏€ญ 5% at 95% level of confidence.
What sample size must I pick in order to achieve this?
Solution
Margin of Error ๏€ฝ ๏‚ฑ 0.05
๏‚ฑ0.05 ๏€ฝ
1
n
( ๏‚ฑ 0.05)2 ๏€ฝ
1
0.0025
n ๏€ฝ 400
n๏€ฝ
1
n
Your Turn
Question
A sweet company claims that 10% of the M&M's it produces are green.
Students found that in a large sample of 500 M&M's 60 were green.
(i)
Calculate the margin of error.
(ii)
State weather 60 greens from 500 is an unusually
high proportion of green M&M's if the claim by the company is assumed to be true.
Question: Solution
A sweet company claims that 10% of the M&M's it produces are green.
Students found that in a large sample of 500 M&M's 60 were green.
(i)
Calculate the margin of error.
(ii)
State weather 60 greens from 500 is an unusually
high proportion of green M&M's if the claim by the company is assumed to be true.
Solution
(i)
Margin of Error ๏€ฝ
(ii)
Reason :
1
1
๏€ฝ
๏€ฝ ๏‚ฑ 0.045 ๏€ฝ ๏‚ฑ 4.5%
n
500
60
๏€ฝ 0.12 ๏€ฝ 12%
500
๐Ÿ
๐’‘โˆ’
๐’
p๏€ฝ
๐’‘
๐’‘โˆ’
๐Ÿ
๐’
10%
7.5%
๐’‘
16.5%
10% is between 7.5% and 16.5% (inside the margin of error) so it seems not to be unusual.
Recognising the Concept of a Hypothesis
Test
Testing claims about a population.
Null Hypothesis: The null hypothesis, denoted by H0 is a claim or
statement about a population. We assume this statement is true
until proven otherwise. (the null hypothesis means that nothing is
wrong with the claim or statement).
Alternative Hypothesis: The alternative hypothesis, denoted by H1 is a
claim or statement which opposes the original statement about a
population.
Courtroom Analogy to Teach
Formal Language
โ€ข At the start of a trial it is assumed the defendant is not guilty.
โ€ข Then the evidence is presented to the judge and jury.
โ€ข The null hypothesis is that the defendant is not guilty (H0)
โ€ข If the jury reject the null hypothesis (H0), this means that they find the
defendant guilty.
โ€ข If the jury fail to reject the null hypothesis (H0), this means that they find the
defendant not guilty.
Often we need to make a decision about a population based on a
sample.
1. Is a coin which is tossed biased if we get a run of 8 heads in 10 tosses?
Assuming that the coin is not biased is called a NULL HYPOTHESIS (H0)
Assuming that the coin is biased is called an ALTERNATIVE HYPOTHESIS (H1)
2. During a 5 minute period a new machine produces fewer faulty parts than
an old machine.
Assuming that the new machine is no better than the old one is called a
NULL HYPOTHESIS (H0)
Assuming that the new machine is better than the old one is called an
ALTERNATIVE HYPOTHESIS (H1)
3. Does a new drug for Hay-Fever work effectively?
Assuming that the new drug does not work effectively is called a NULL
HYPOTHESIS (H0)
Assuming that the new drug does work effectively called an ALTERNATIVE
HYPOTHESIS (H1)
Hypothesis test on a population proportion
using Margin of Error
๐Ÿ
๐‘ณ๐‘ช๐‘ถ๐‘ณ: ๐‘ด๐‘ฌ = ±
, ๐’˜๐’‰๐’†๐’“๐’† ๐’˜๐’‰๐’†๐’“๐’† ๐’ ๐’Š๐’” ๐’•๐’‰๐’† ๐’”๐’‚๐’Ž๐’‘๐’๐’† ๐’”๐’Š๐’›๐’†.
๐’
95% confidence interval
๐Ÿ
๐’‘โˆ’
๐’
๐Ÿ
๐’‘+
๐’
Population
Proportion ๐’‘
Claim %
(H0) is
outside
Claim %
(H0) is
inside
Claim %
(H0) is
inside
Reject
Fail to
Reject
Fail to
Reject
Claim %
(H0) is
outside
Reject
Example 1
Go Fast Airlines
Go Fast Airlines provides internal flights in Ireland, short haul flights to
Europe and long haul flights to America and Asia. Each month the
company carries out a survey among 1000 passengers. The company
repeatedly advertises that 70% of their customers are satisfied with their
overall service. 664 of the sample stated they were satisfied with the
overall service.
Example 1
Go Fast Airlines provides internal flights in Ireland, short haul flights to Europe and long haul flights to
America and Asia. Each month the company carries out a survey among 1000 passengers. The
company repeatedly advertises that 70% of their customers are satisfied with their overall service. 664
of the sample stated they were satisfied with the overall service. Would you say that the company
were correct in saying that 70% of their customers were satisfied? State the null hypotheses and state
your conclusions clearly.
Null Hypothesis: The proportion of passengers who are satisfied with the service is unchanged
70%. p = 0.7
Alternative Hypothesis: The proportion of passengers who are satisfied with the service is not
70%. p๏‚น 0.7
Evidence:
Sample Proportion =
Margin of Error =
1
n
664
1000
1
=
= 0.664 = 66.4%
1000
= 0.0316 = 3.16%
๐Ÿ
๐’‘63.24%
โˆ’
๐’
๐’‘
๐Ÿ
๐’‘ โˆ’ 69.56%
๐’
70%
Reject
Conclusion
The 70% is outside the range 63.24% to 69.56% of our confidence interval.
There is sufficient evidence to reject the claim that the percentage of passengers who are
happy with the service is 70% at the 5% level of significance.
Possible Actions: Change the advertisement from 70% to 65%.
Meet with staff to come up with suggestions about how to improve the level of satisfaction.
Do a further survey to find out more detail about why the level of satisfaction has changed.
Your Turn
Question 1
It is generally agree that 40% of the voting public are in favour of a change of government.
A survey was carried out on 900 randomly selected people to see if there was a
change in support for the government. The result was that 42% are now in favour
of a change of government.
(i)
Calculate the margin of error.
(ii)
(iii)
State the null and alternative hypothesis.
At a 5% level of significance, would you accept or reject the null hypothesis?
Give a reason for your conclusion.
Solution
(ii)
1
1
๏€ฝ
๏€ฝ ๏‚ฑ 0.03 ๏€ฝ ๏‚ฑ 3%
n
900
Null hypothesis, H0 : "There is no change in the support for the government"
(iii)
Alternative hypothesis, H0 : "There is a change in the support for the government"
We Fail to Reject (Accept) H0 the null hypothesis.
(i)
Margin of Error ๏€ฝ
Reason : See the diagram below. 40% is inside the margin of error.
Question 1: Solution
It is generally agree that 40% of the voting public are in favour of a change of government.
A survey was carried out on 900 randomly selected people to see if there was a
change in support for the government. The result was that 42% are now in favour
of a change of government.
(i)
Calculate the margin of error.
(ii)
(iii)
State the null and alternative hypothesis.
At a 5% level of significance, would you accept or reject the null hypothesis?
Give a reason for your conclusion.
Solution
(ii)
1
1
๏€ฝ
๏€ฝ ๏‚ฑ 0.03 ๏€ฝ ๏‚ฑ 3%
n
900
Null hypothesis, H0 : "There is no change in the support for the government"
(iii)
Alternative hypothesis, H0 : "There is a change in the support for the government"
We Fail to Reject (Accept) H0 the null hypothesis.
(i)
Margin of Error ๏€ฝ
Reason : See the diagram below. 40% is inside the margin of error.
40%
๐Ÿ
๐’‘ โˆ’39%
๐’
๐’‘
๐’‘โˆ’
45%
๐Ÿ
๐’
Fail to
Reject
Question 2
RTÉ claim that 60% of all viewers watch the Late Late Show every Friday
night. An independent survey was carried out on 400 randomly selected
viewers to see if the claim were true. The result of the survey was that 180
were watching the Late Late Show.
I. Calculate the margin of error.
II. State the Null and Alternative Hypothesis.
III. Would you accept or reject the Null Hypothesis according to this survey?
Give a reason for your conclusion.
Question 2: Solution
1
400
I.
Margin of Error =
= 0.05 = 5%
II.
Null hypothesis : 60% of viewers watch the Late Late Show.
Alternative hypothesis : 60% of viewers do not watch the Late Late Show.
180
๐‘ = 400 = 0.45 = 45%
60%
๐Ÿ
๐’‘ โˆ’ 40%
๐’
๐’‘
๐’‘โˆ’
50%
๐Ÿ
๐’
Reject
iii. There is sufficient evidence, according to the survey, Reject the Null
Hypotheses. Reason: 60% is outside the confidence interval.
Empirical Rule
๐ = ๐’‘๐’๐’‘๐’–๐’๐’‚๐’•๐’Š๐’๐’ ๐’Ž๐’†๐’‚๐’
๐ˆ = ๐’‘๐’๐’‘๐’–๐’๐’‚๐’•๐’Š๐’๐’
๐’”๐’•๐’‚๐’๐’…๐’‚๐’“๐’… ๐’…๐’†๐’—๐’Š๐’‚๐’•๐’Š๐’๐’
68%
๏ญ ๏€ญ 3๏ณ
๏ญ ๏€ญ 2๏ณ
๏ญ ๏€ญ 1๏ณ
๏ญ
๏ญ ๏€ซ 1๏ณ
๏ญ ๏€ซ 2๏ณ
๏ญ ๏€ซ 3๏ณ
95%
99.7%
What about 1·5 std devs or 0·8 std devs?
Night 2
Normal Distribution to
Standard Normal Distribution
Different sets of data have different means and standard deviations but any
that are normally distributed have the same bell-shaped normal distribution
type of curves.
Normal Distribution Curve
Standard Normal Curve
In order to avoid unnecessary calculations and graphing the scale of a
Normal Distribution curve is converted to a standard scale called the z score
or standard unit scale.
Normal
Standard Normal
Distributions
Distribution
๏ญ ๏€ฝ 13
๏ณ๏€ฝ3
4
๏ญ๏€ฝ0
๏ณ ๏€ฝ1
7
10
13
16
19
22
๏ญ ๏€ฝ 278
๏ณ ๏€ฝ 12
242
โ€“3
254
266
278
290
302
314
โ€“2
โ€“1
0
1
2
3
Standard Normal Distribution
1 ๏€ญ 12 z2
If ๏ญ ๏€ฝ 0 and ๏ณ ๏€ฝ 1 we would plot
e
2๏ฐ
This graph gives the Standard Normal Graph with a standardised scale.
Total area under the curve
P(๏€ญ๏‚ฅ ๏€ผ z ๏€ผ ๏‚ฅ) ๏€ฝ
๏ญ ๏€ญ 3๏ณ
๏€ญ3
1
2๏ฐ
๏ƒฒ
๏‚ฅ
๏€ญ๏‚ฅ
e
1
๏€ญ z2
2
๏ญ ๏€ญ 2๏ณ
๏€ญ2
dz ๏€ฝ 1
๏ญ๏€ญ๏ณ
๏€ญ1
๏ญ
0
๏ญ๏€ซ๏ณ
1
๏ญ ๏€ซ 2๏ณ
2
๏ญ ๏€ซ 3๏ณ
3
z ๏€ญ scores
The area between the Standard Normal Curve
and the z ๏€ญ axis between ๏€ญ ๏‚ฅ and ๏€ซ ๏‚ฅ is 1.
Standard Units (z โ€“ scores)
x ๏€ญ๏ญ
z๏€ฝ
๏ณ
x is a data point
๏ญ is the population mean
๏ณ is the standard deviation of the population
z โ€“ scores define the position of a score in relation to the mean using the
standard deviation as a unit of measurement.
z โ€“ scores are very useful for comparing data points in different distributions.
The z โ€“ score is the number of standard deviations by which the score departs
from the mean.
This standardises the distribution.
Reading z โ€“ values From Tables
Example 1
Pg. 36
Using the tables find P(Z ๏‚ฃ 1 ๏ƒ— 31).
For a given z, the table gives
2๏ฐ
๏ƒฒ
z
๏€ญ๏‚ฅ
1
๏€ญ t
2
e dt
Pg. 37
P(Z ๏‚ฃ z) ๏€ฝ
1
โ€“3
โ€“2
โ€“1
0
P(Z ๏‚ฃ 1 ๏ƒ— 31) can be read from the tables directly
P(Z ๏‚ฃ 1 ๏ƒ— 31) ๏€ฝ 0 ๏ƒ— 9049 ๏€ฝ 90.49%
1
1.31
2
3
Example 2
Pg. 37
Pg. 36
Using the tables find P(Z ๏‚ณ 1 ๏ƒ— 32)
โ€“3
โ€“2
โ€“1
0
1
P(Z ๏‚ณ z) is equal to 1 ๏€ญ P(Z ๏‚ฃ z)
P(Z ๏‚ณ 1 ๏ƒ— 32) ๏€ฝ 1 ๏€ญ P(Z ๏‚ฃ 1 ๏ƒ— 32)
P(Z ๏‚ณ 1 ๏ƒ— 32) ๏€ฝ 1 ๏€ญ 0 ๏ƒ— 9066 ๏€ฝ 0 ๏ƒ— 0934 ๏€ฝ 9.34%
1.32
2
3
The table only gives value to the left of z, but
the fact that the total area under the curve
equals 1, allows us to use, P(Z ๏‚ณ z) ๏€ฝ 1 ๏€ญ P(Z ๏‚ฃ z)
P(Z ๏‚ณ z)
P(Z ๏‚ฃ z)
P(Z ๏‚ณ z)
1 ๏€ญ P(Z ๏‚ฃ z)
0
z
Example 3
โ€“3
โ€“2
โ€“1
0
1
2
Pg. 37
Pg. 36
Using the tables find P(Z ๏‚ฃ ๏€ญ0 ๏ƒ— 74).
3
โ€“0.74
The tables only work for positive values but as
the curve is symmetrical about z ๏€ฝ 0
P(Z ๏‚ฃ ๏€ญ0 ๏ƒ— 74) ๏€ฝ P(Z ๏‚ณ 0 ๏ƒ— 74)
P(Z ๏‚ฃ ๏€ญ0 ๏ƒ— 74) ๏€ฝ 1 ๏€ญ P(Z ๏‚ฃ 0 ๏ƒ— 74)
P(Z ๏‚ฃ ๏€ญ0 ๏ƒ— 74) ๏€ฝ 1 ๏€ญ 0 ๏ƒ— 7704 = 0 ๏ƒ— 2296 ๏€ฝ 22.96%
Both areas are the same and hence
both probabilities are equal as the curve
is symmetrical about the mean, 0.
P(Z ๏‚ฃ ๏€ญz)
P(Z ๏‚ณ z)
0
โ€“z
z
Example 4
Pg. 37
Pg. 36
Using the tables find P( ๏€ญ 1 ๏ƒ— 32 ๏‚ฃ z ๏‚ฃ 1 ๏ƒ— 29)
โ€“3
โ€“2
0
โ€“1
1
โ€“1.32
2
3
1.29
โ€“3
โ€“2
โ€“1
0
1
2
1.29
3
โ€“3
โ€“2 โ€“1
โ€“1.32
0
1
2
3
P( ๏€ญ 1 ๏ƒ— 32 ๏‚ฃ z ๏‚ฃ 1 ๏ƒ— 29) ๏€ฝ Area to the Left of 1 ๏ƒ— 29 ๏€ญ Area to the left of ๏€ญ 1.32
๏€ฝ P(z ๏‚ฃ 1 ๏ƒ— 29) ๏€ญ ๏›1 ๏€ญ P(z ๏‚ฃ 1 ๏ƒ— 32)๏
๏€ฝ 0 ๏ƒ— 9015 ๏€ญ [1 ๏€ญ 0 ๏ƒ— 9066] = 0 ๏ƒ— 8081 ๏€ฝ 80.81%
Your Turn
Question 1
The amounts due on a mobile phone bill in Ireland are normally distributed with a mean of โ‚ฌ53 and a
standard deviation of โ‚ฌ15. If a monthly phone bill is chosen at random, find the probability that the
amount due is between โ‚ฌ47 and โ‚ฌ74.
Solution
x ๏€ญ๏ญ
x ๏€ญ๏ญ
z1 ๏€ฝ
z2 ๏€ฝ
๏ณ
๏ณ
47 ๏€ญ 53
74 ๏€ญ 53
z1 ๏€ฝ
z2 ๏€ฝ
15
15
z1 ๏€ฝ ๏€ญ0 ๏ƒ— 4
z2 ๏€ฝ 1 ๏ƒ— 4
P(๏€ญ0 ๏ƒ— 4 ๏€ผ Z ๏€ผ 1 ๏ƒ— 4)
P(๏€ญ0 ๏ƒ— 4 ๏€ผ Z ๏€ผ 1 ๏ƒ— 4) ๏€ฝ P(Z ๏‚ฃ 1 ๏ƒ— 4) ๏€ญ ๏›1 ๏€ญ P(Z ๏‚ฃ 0 ๏ƒ— 4)๏
P(๏€ญ0 ๏ƒ— 4 ๏€ผ Z ๏€ผ 1 ๏ƒ— 4) ๏€ฝ 0 ๏ƒ— 9192 ๏€ญ [1 ๏€ญ 0 ๏ƒ— 6554]
P(๏€ญ0 ๏ƒ— 4 ๏€ผ Z ๏€ผ 1 ๏ƒ— 4) ๏€ฝ 0 ๏ƒ— 5746
Question 1: Solution
The amounts due on a mobile phone bill in Ireland are normally distributed with a mean of โ‚ฌ53 and a
standard deviation of โ‚ฌ15. If a monthly phone bill is chosen at random, find the probability that the
amount due is between โ‚ฌ47 and โ‚ฌ74.
Solution
x ๏€ญ๏ญ
x ๏€ญ๏ญ
z1 ๏€ฝ
z2 ๏€ฝ
๏ณ
๏ณ
47 ๏€ญ 53
74 ๏€ญ 53
z1 ๏€ฝ
z2 ๏€ฝ
15
15
z1 ๏€ฝ ๏€ญ0 ๏ƒ— 4
z2 ๏€ฝ 1 ๏ƒ— 4
8
23
38 47 53
68 74 83
98
โ€“3
P(๏€ญ0 ๏ƒ— 4 ๏€ผ Z ๏€ผ 1 ๏ƒ— 4)
P(๏€ญ0 ๏ƒ— 4 ๏€ผ Z ๏€ผ 1 ๏ƒ— 4) ๏€ฝ P(Z ๏‚ฃ 1 ๏ƒ— 4) ๏€ญ ๏›1 ๏€ญ P(Z ๏‚ฃ 0 ๏ƒ— 4)๏
P(๏€ญ0 ๏ƒ— 4 ๏€ผ Z ๏€ผ 1 ๏ƒ— 4) ๏€ฝ 0 ๏ƒ— 9192 ๏€ญ [1 ๏€ญ 0 ๏ƒ— 6554]
P(๏€ญ0 ๏ƒ— 4 ๏€ผ Z ๏€ผ 1 ๏ƒ— 4) ๏€ฝ 0 ๏ƒ— 5746
โ€“2
โ€“1 โ€“0.4 0
1 1.4
2
3
Question 2
The mean percentage achieved by a student in a statistic exam is 60%.
The standard deviation of the exam marks is 10%.
(i)
(ii)
What is the probability that a randomly selected student scores above 80%?
What is the probability that a randomly selected student scores below 45%?
(iii)
What is the probability that a randomly selected student scores between 50% and 75%?
(iv)
Suppose you were sitting this exam and you are offered a prize for getting a mark which is
greater than 90% of all the other students sitting the exam?
What percentage would you need to get in the exam to win the prize?
Solution
(i)
x ๏€ญ ๏ญ 80 ๏€ญ 60
๏€ฝ
๏€ฝ2
๏ณ
10
P(Z ๏€พ 2) ๏€ฝ 1 ๏€ญ P(Z ๏€ผ 2)
z๏€ฝ
P(Z ๏€พ 2) ๏€ฝ 1 ๏€ญ 0.9772 ๏€ฝ 0.0228 ๏€ฝ 2.28%
(ii)
30
โ€“3
40
โ€“2
50
โ€“1
60
0
70
1
80
2
90
3
30
โ€“3
40 45 50
โ€“2 โ€“1.5 โ€“1
60
0
70
1
80
2
90
3
x ๏€ญ ๏ญ 45 ๏€ญ 60
๏€ฝ
๏€ฝ ๏€ญ1.5
๏ณ
10
P(Z ๏€ผ ๏€ญ1.5) ๏€ฝ P(Z ๏€พ 1.5) ๏€ฝ 1 ๏€ญ P(Z ๏€ผ 1.5)
z๏€ฝ
P(Z ๏€ผ ๏€ญ1.5) ๏€ฝ 1 ๏€ญ 0.9332 ๏€ฝ 0.0668 ๏€ฝ 6.68%
Question 2: Solution
The mean percentage achieved by a student in a statistic exam is 60%.
The standard deviation of the exam marks is 10%.
(i)
(ii)
What is the probability that a randomly selected student scores above 80%?
What is the probability that a randomly selected student scores below 45%?
(iii)
What is the probability that a randomly selected student scores between 50% and 75%?
(iv)
Suppose you were sitting this exam and you are offered a prize for getting a mark which is
greater than 90% of all the other students sitting the exam?
What percentage would you need to get in the exam to win the prize?
Solution
(i)
x ๏€ญ ๏ญ 80 ๏€ญ 60
๏€ฝ
๏€ฝ2
๏ณ
10
P(Z ๏€พ 2) ๏€ฝ 1 ๏€ญ P(Z ๏€ผ 2)
z๏€ฝ
P(Z ๏€พ 2) ๏€ฝ 1 ๏€ญ 0.9772 ๏€ฝ 0.0228 ๏€ฝ 2.28%
(ii)
30
โ€“3
40
โ€“2
50
โ€“1
60
0
70
1
80
2
90
3
30
โ€“3
40 45 50
โ€“2 โ€“1.5 โ€“1
60
0
70
1
80
2
90
3
x ๏€ญ ๏ญ 45 ๏€ญ 60
๏€ฝ
๏€ฝ ๏€ญ1.5
๏ณ
10
P(Z ๏€ผ ๏€ญ1.5) ๏€ฝ P(Z ๏€พ 1.5) ๏€ฝ 1 ๏€ญ P(Z ๏€ผ 1.5)
z๏€ฝ
P(Z ๏€ผ ๏€ญ1.5) ๏€ฝ 1 ๏€ญ 0.9332 ๏€ฝ 0.0668 ๏€ฝ 6.68%
Question 2: Solution
(iii)
x ๏€ญ๏ญ
๏ณ
50 ๏€ญ 60
z1 ๏€ฝ
10
z1 ๏€ฝ ๏€ญ1
z1 ๏€ฝ
x ๏€ญ๏ญ
๏ณ
75 ๏€ญ 60
z2 ๏€ฝ
10
z2 ๏€ฝ 1.5
z2 ๏€ฝ
30
โ€“3
P(๏€ญ1 ๏€ผ Z ๏€ผ 1 ๏ƒ— 5) ๏€ฝ P(Z ๏‚ฃ 1 ๏ƒ— 5) ๏€ญ ๏›1 ๏€ญ P(Z ๏‚ฃ 1)๏
40
โ€“2
50
โ€“1
60
0
70 75 80
1 1.5 2
P(๏€ญ1 ๏€ผ Z ๏€ผ 1 ๏ƒ— 5) ๏€ฝ 0.9332 ๏€ญ [1 ๏€ญ 0.8413]
P(๏€ญ1 ๏€ผ Z ๏€ผ 1 ๏ƒ— 5) ๏€ฝ 0.7745
(iv)
From the tables an answer for an area of 90% (0.9) ๏€ฝ 1.28 ๏ƒž Z ๏€ฝ 1.28
x ๏€ญ๏ญ
๏ณ
x ๏€ญ 60
1.28 ๏€ฝ
๏ƒž x ๏€ฝ 72.8 marks
10
z๏€ฝ
30
โ€“3
40
โ€“2
50
โ€“1
60
0
70 72.8 80
1 1.28 2
90
3
90
3
Pg. 37
Pg. 36
For Higher Level Leaving Cert use z scores
๏€ญ1.96
2.5%
0
95%
๏€ซ1.96
2.5
%
Confidence interval
for population proportion
๐‘ณ๐‘ช๐‘ฏ๐‘ณ โˆถ ๐‘ช๐’๐’๐’‡๐’Š๐’…๐’†๐’๐’„๐’† ๐‘ณ๐’Š๐’Ž๐’Š๐’•๐’” = ±๐Ÿ. ๐Ÿ—๐Ÿ”
๐’‘(๐Ÿ โˆ’ ๐’‘)
๐’
95% confidence interval
๐’‘ (๐Ÿ โˆ’ ๐’‘)
๐’‘ โˆ’ ๐Ÿ. ๐Ÿ—๐Ÿ”
๐’
Population
Proportion ๐’‘
๐’‘ + ๐Ÿ. ๐Ÿ—๐Ÿ”
๐’‘(๐Ÿ โˆ’ ๐’‘)
๐’
95% confident that the population proportion is inside this confidence
interval
Example 1
Skygo provides Wifi in the Galway area . In March the company carries out a
survey among 625 of its costumers. The company advertises that 60% of their
customers were satisfied with their download speeds. 370 of the sample
stated they were satisfied with their download speed time. Create a 95%
confidence interval based on your sample.
Sample Proportion =
๐Ÿ‘๐Ÿ•๐ŸŽ
๐Ÿ”๐Ÿ๐Ÿ“
= ๐ŸŽ. ๐Ÿ“๐Ÿ—๐Ÿ = ๐Ÿ“๐Ÿ—. ๐Ÿ%
Confidence Limits = ±๐Ÿ. ๐Ÿ—๐Ÿ”
๐ŸŽ.๐Ÿ“๐Ÿ—๐Ÿ (๐Ÿโˆ’๐ŸŽ.๐Ÿ“๐Ÿ—๐Ÿ)
๐Ÿ”๐Ÿ๐Ÿ“
= ±๐ŸŽ. ๐ŸŽ๐Ÿ‘๐Ÿ–๐Ÿ“ = ±๐Ÿ‘. ๐Ÿ–๐Ÿ“%
95% confidence interval
๐‘ 55.36%
โˆ’ ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
๐’‘๐’‘
๐‘ + ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
63.04%
Your Turn
Question 1:
The Sunday Independent reports that the government's approval rating is at 65%. The
paper states that the poll is based on a random sample of 972 voters and that the margin
of error is 3%
Show that the pollsters used a 95% level of confidence.
Question 1: Solution
The Sunday Independent reports that the government's approval rating is at 65%. The
paper states that the poll is based on a random sample of 972 voters and that the margin
of error is 3%
Show that the pollsters used a 95% level of confidence.
Solution
Confidence Limits= ±๐‘ฅ
0.03 = ±๐‘ฅ
๐‘(1โˆ’๐‘ )
๐‘›
0.65 (1โˆ’0.65)
972
0.03 = ±๐‘ฅ(0.0153)
0.03
=±๐‘ฅ
0.0153
๐‘ฅ =±1.96
Therefore they are using a 95% level of confidence.
Question 2
It is known that 30% of a certain kind of apple seed will germinate. In an experiment 85 out
of 300 seeds germinated. Construct a 95% confidence interval for the sample proportion.
๐Ÿ–๐Ÿ“
Sample Proportion = ๐Ÿ‘๐ŸŽ๐ŸŽ = ๐ŸŽ. ๐Ÿ๐Ÿ–๐Ÿ‘ = ๐Ÿ๐Ÿ–. ๐Ÿ‘%
Confidence Limits = ±๐Ÿ. ๐Ÿ—๐Ÿ”
๐ŸŽ.๐Ÿ๐Ÿ–๐Ÿ‘ (๐Ÿโˆ’๐ŸŽ.๐Ÿ๐Ÿ–๐Ÿ‘)
๐Ÿ‘๐ŸŽ๐ŸŽ
= ๐ŸŽ. ๐ŸŽ๐Ÿ“๐Ÿ
95% confidence interval
๐‘ โˆ’ ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
๐’‘
0.232 < ๐‘ < 0.334
๐‘ + ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ
Sample means
Sample Means
The data below are the heights in cm, of a population of 100, 15 year old students
165
161
170
182
176
185
180
155
154
166
165
152
174
167
165
171
172
150
181
165
166
161
174
158
166
168
164
150
155
170
168
144
164
154
177
173
178
158
165
175
180
174
152
167
148
175
153
162
180
175
157
172
155
140
147
160
152
166
168
158
153
165
160
143
166
167
167
163
158
160
150
157
172
167
184
172
165
159
158
177
179
174
156
178
165
179
174
148
175
166
157
159
163
165
162
153
145
170
176
180
From the list above the Mean of the Population ๏ญ ๏€ฝ
๏ƒฅ x ๏€ฝ 164 ๏ƒ— 72
n
From the list above the Standard Deviation of the Population ๏ณ ๏€ฝ
Slide60
๏ƒฅ๏€จ x ๏€ญ ๏ญ๏€ฉ
n
2
๏€ฝ 10 ๏ƒ— 21
It does not matter if the original distribution of the sample means
will always be normally distributed. Use Java Applets.
Slide61
A single sample of 5 data points.
A single sample of 10 data points.
x
The black arrows are the data points. The mean of the sample is the red dot
Naturally if we choose a sample size of 100 (original population size) the mean of the
sample will be that same as the mean of the population.
๐ˆ
As the sample size increases the standard error
will decrease.
๐’
Why? โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ
The sample means are normally distributed
For a sample size of 30
1. ๏ญ x ๏‚ป ๏ญ or m ๏‚ป ๏ญ
The sample means are normally distributed
For a sample size of 30
๏ณx ๏‚ป
๏ณ
n
or
s๏‚ป
๏ณ
n
For a sample size of 30
1. ๏ญ x ๏‚ป ๏ญ or m ๏‚ป ๏ญ
2. ๏ณ x ๏‚ป
๏ณ
๏ณ
or s ๏‚ป
n
n
Population
Population
Large Sample
Sample Means
Mean
๐œ‡
๐‘ฅ
๐œ‡๐‘ฅ
Standard Deviation
๐œŽ
๐‘ 
๐œŽ๐‘ฅ (Standard Error)
Population
S ample 1
S ample 2
S ample 3
S ample 6
S ample 4
S ample 5
Summary
Population
Large Sample
Sample Means
Mean
๐œ‡
๐‘ฅ
๐œ‡๐‘ฅ
Standard Deviation
๐œŽ
๐‘ 
๐œŽ๐‘ฅ (Standard Error)
In practice, from the table above, we can say that for n ๏‚ณ 30
1. The sample means are normally distributed.
2. The mean of the sample means is the same as the population mean. ๏ญ x ๏€ฝ ๏ญ
3. The standard deviation of the sample means is equal to
this is called the standard error. ๏ณ x ๏€ฝ
๏ณ
n
๏ณ
n
KEY IDEA CLICK LINK BELOW
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
In the Standard Normal Distribution we want the values of z1 such that 95% of
the population lies in the interval - z1 โ‰ค z โ‰ค z1
P(z ๏‚ฃ z1 ) ๏€ฝ 0 ๏ƒ— 95 ๏€ซ 0 ๏ƒ— 025 ๏€ฝ 0 ๏ƒ— 975
๏ƒž z1 ๏€ฝ 1 ๏ƒ— 96
and ๏€ญ z1 ๏€ฝ ๏€ญ1 ๏ƒ— 96
Therefore in a Normal Distribution 95% of the population lies within 1โˆ™96
standard deviations of the mean.
95% of the population lies within 1โˆ™96 of ฮผ( the population mean)
๏œ๏ญ
x
๏€ญ 1 ๏ƒ— 96๏ณ x ๏€ผ ๏ญ ๏€ผ ๏ญ
As ๏ณ x ๏€ฝ
Slide71
๏ณ
n
x
๏€ซ 1 ๏ƒ— 96๏ณ x
๏ƒž the confidence limits are ๏ญ
๏‚ฑ 1 ๏ƒ— 96
x
๏ณ
n
Example 1
A random sample of 250 cars were taken and the mean age of the cars was
4 ๏ƒ— 5 years and the standard diviation was 2 ๏ƒ— 2 years.
(i) Find the 95% confidence interval for the mean age of all cars.
(ii) What size sample is required to estimate the mean age, with 95% confidence
within ๏‚ฑ 0.3 years.
(i)
The confidence limits are x ๏‚ฑ 1 ๏ƒ— 96
๏ณ
n
2๏ƒ—2
250
4 ๏ƒ— 5 ๏€ญ 1 ๏ƒ— 96(0 ๏ƒ—139) ๏€ผ ๏ญ ๏€ผ 4 ๏ƒ— 5 ๏€ซ 1 ๏ƒ— 96(0 ๏ƒ—139)
4 ๏ƒ— 23 ๏€ผ ๏ญ ๏€ผ 4 ๏ƒ— 77
This means that we can say with 95% confidence
that the mean age of all cars in the population is
between 4 ๏ƒ— 23 years and 4 ๏ƒ— 77 years.
๏€ฝ 4 ๏ƒ— 5 ๏‚ฑ 1 ๏ƒ— 96
(ii)
๏ณ
๏€ฝ ๏‚ฑ 0๏ƒ—3
n
2๏ƒ—2
๏ƒž 1 ๏ƒ— 96
๏€ฝ 0๏ƒ—3
n
๏€จ1 ๏ƒ— 96 ๏€ฉ๏€จ2 ๏ƒ— 2๏€ฉ
๏ƒž n๏€ฝ
0๏ƒ—3
๏€ฝ 14 ๏ƒ— 373
๏‚ฑ 1 ๏ƒ— 96
๏ƒž n ๏€ฝ ๏€จ14 ๏ƒ— 373๏€ฉ ๏€ฝ 207cars
2
Example 2
A random sample of 144 male students in a large university was taken and their heights measured.
The mean height was 175 cm. The standard deviation of all the male students in the university
was 9 cm.
(i) Give a 95% confidence intreval for the heights of all the male students.
(ii) Show that the confidence interval would decrease if a sample size was 225 instead of 144.
(i) n ๏€ฝ 144, x (mean of the sample) ๏€ฝ 175,
๏ณ (standard deviation of the population) = 9,
๏ญ (population mean) is unknown.
๏ณ
We calculate the standard error of the mean using ๏ณx ๏€ฝ
n
9
๏ณx ๏€ฝ
๏€ฝ 0.75
144
As the sample size is large the best possible estimated value of ๏ญ is x which is 175 cm.
Now we have to give a range of values in which the true population mean (๏ญ) lies.
This will be with 95% level of certainty.
๏ณ
๏ณ
x ๏€ญ 1.96
๏‚ฌ ๏ญ ๏‚ฎ x ๏€ซ 1.96
n
n
175 ๏€ญ 1.96(0.75)
๏‚ฌ ๏ญ ๏‚ฎ 175 ๏€ซ 1.96(0.75)
173.53
๏‚ฌ ๏ญ ๏‚ฎ 176.47
The true population mean lies within the range 173.53 cm to 176.47 cm with 95% certainty.
(ii)
If a sample of 225 were taken the standard error would be
9
๏ณx ๏€ฝ
๏€ฝ 0.6
225
๏ณ
๏ณ
x ๏€ญ 1.96
๏‚ฌ ๏ญ ๏‚ฎ x ๏€ซ 1.96
n
n
175 ๏€ญ 1.96(0.6)
๏‚ฌ ๏ญ ๏‚ฎ 175 ๏€ซ 1.96(0.6)
173.82
๏‚ฌ ๏ญ ๏‚ฎ 176.18
The true population mean lies within the range 173.82 cm to 176.18 cm with 95% certainty.
The confidence interval has decreased.
This is narrower than the previous confidence interval.
As you incerase the sample size you decrease the width of the confidence interval.
A study addressed the issue of whether pregnant women can correctly guess the sex of their baby.
Among 104 recruited subjects, 57 correctly guessed the sex of the baby
Use these sample data to test the claim that the success rate of such guesses is no different from the
50% success rate expected with random chance guesses. Use a 5% significance level.
(based on data from โ€œAre Women Carrying โ€˜Basketballsโ€™ Really Having Boys? Testing Pregnancy Folklore,โ€ by Perry, DiPietro, and Constigan, Birth, Vol. 26, No. 3)
Solution:
The original claim is that the success rate is no different from 50%.
H0 ๏€ฝ 0.5
H1 ๏‚น 0.5
pห† ๏€ฝ
z๏€ฝ
57
๏€ฝ 0.548
104
pห† ๏€ญ p
p(1 ๏€ญ p) n
๏€ฝ
0.548 ๏€ญ 0.50
(0.5)(0.5)/104
๏€ฝ 0.98
At 5% level of significance the critical values are ๏‚ฑ 1.96
As 0.98 is between ๏€ญ 1.96 and 1.96 we fail to reject the null hypthesis.
There is not sufficient evidence to warrant rejection of the claim that women who guess the sex of
their babies have a success rate equal to 50%.
Your Turn
2005 LC. HL . Q 9 (c)
A survey was carried out to find the weekly rental costs of holiday apartments in a certain country.
A random sample of 400 apartments was taken. The mean of the sample was โ‚ฌ320 and the
standard deviation was โ‚ฌ50.
Form a 95% confidence interval for the mean weekly rental costs of holiday apartments in that country.
The confidence limits are x ๏‚ฑ 1 ๏ƒ— 96
๏ณ
n
50
400
320 ๏€ญ 1 ๏ƒ— 96(2 ๏ƒ— 5) ๏€ผ ๏ญ ๏€ผ 320 ๏€ซ 1 ๏ƒ— 96(2 ๏ƒ— 5)
๏€ฝ 320 ๏‚ฑ 1 ๏ƒ— 96
315 ๏ƒ—1 ๏€ผ ๏ญ ๏€ผ 324 ๏ƒ— 9
Between โ‚ฌ315 ๏ƒ—10 and โ‚ฌ324 ๏ƒ— 90
Night 3
Slide79
Often we need to make a decision about a population based on a sample.
In a trial you are presumed innocent until after the trial?
Assuming that an accused person is innocent ( nothing has happened) is
called a NULL HYPOTHESIS (H0)
Assuming that an accused person is not innocent called an ALTERNATIVE
HYPOTHESIS (H1)
1.
Is a coin which is tossed biased if we get a run of 8 heads in 10 tosses?
Assuming that the coin is not biased is called a NULL HYPOTHESIS (H0)
Assuming that the coin is biased is called an ALTERNATIVE HYPOTHESIS (H1)
2. During a 5 minute period a new machine produces fewer faulty parts than an old
machine.
Assuming that the new machine is no better than the old one is called a NULL
HYPOTHESIS (H0)
Assuming that the new machine is better than the old one is called an ALTERNATIVE
HYPOTHESIS (H1)
3. Does a new drug for Hay-Fever work effectively?
Assuming that there is no difference between the new drug and the current
drug called a NULL HYPOTHESIS. ( H0 )
Assuming that the new drug is better than the current most popular drug is called an
ALTERNATIVE HYPOTHESIS. ( H1 )
Testing the Null Hypothesis using z-values
A Two Tailed Test.
Reject
Fail to
Reject
Fail to
Reject
Reject
The critical values for a 5% level of significance
z = ๏€ญ 1โˆ™96 or z = 1โˆ™96
Slide81
Testing the Null Hypothesis using z-values
The statistical method used to determine whether H0 is true or not is called
HYPOTHESIS TESTING.
Statisticians speak of โ€œnot accepting or accepting H0 at a certain levelโ€. This
level is called the LEVEL OF SIGNIFICANCE. ( 5% level of significance is on the
syllabus).
If the value of z lies outside the range ๏€ญ 1โˆ™96 < z < 1โˆ™96 (critical region)
we reject H0 .
Reject
Fail to
Reject
Fail to
Reject
Reject
Testing hypotheses about a population
mean large samples .
If we take a large sample of size n from a population with a mean of ๏ญ and a standard deviation of ๏ณ.
We have to calculate the mean of the sample x. ( ๏ญ x ๏€ฝ x when we are dealing with large samples)
๏ณ
.
n
We want to test the hypothesis that the sample comes from a population with a
paticular value of ๏ญ called ๏ญ 0
We can also calculate ๏ณx (s) by using ๏ณx ๏€ฝ
Step 1. State the null and alternative hypotheses.
Null Hypotheses:
๏ญ ๏€ฝ ๏ญ0
Alternative Hypothesis: ๏ญ ๏‚น ๏ญ 0
Note 1: Not using ๏ญ>0 or ๏ญ<0. No direction stipulated.
Therefore this is a two tailed test. (Only Two Tailed Test on for Leaving Cert.)
Note 2: Null Hypothesis always has an equal sign and uses population parameters
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
X ๏€ญ๏ญ
.
๏ณ
As we are dealing with the sampling distribution of the mean
The test statistic is a Standard Normal Z score with Z =
x ๏€ญ ๏ญ0
.
๏ณ
n
(This is the difference between the value we have observed from our sample
and the hypothesised value from the population divide by the standard error)
Observed Value ๏€ญ Hypothesised Value
Z=
Standard Error
Z=
Step 3. Write down the critical values. ๏€จ a sketch also helps ๏€ฉ .
Step 4. Reject H0 if Z is in the critical regions,otherwise fail to reject H0 .
Once we have the value of Z we compare it to our critical values and decide
wheather or not to reject the null hypotheses.
Review of the steps involved
in Hypothesis Testing:
1. Write down the null hypothesis H0 and the alternative hypothesis H1
2. Convert the observed results into z units. (Calculate the test statistic).
3. Write down the critical values. (a sketch also helps).
4. Reject H0 if z is in the critical regions, otherwise fail to reject H0.
Example 1
A company manafactures pens with a mean writing life of 500 hours and
a standard deviation of 10 hours. A retailer examines a sample of 81 pens from
a supplier who claims to only sell pens from this company and finds their mean
life is 497 hours. Are these pens genuine products from the company?
Step 1. State the null and alternative hypotheses.
Null Hypotheses H0 : The sample of pens are genuine products from the compny. ๏ญ ๏€ฝ 500
Alternative Hypothesis H1 : The sample of pens are not genuine products from the compny. ๏ญ ๏‚น 500
Note : If not given a Level of Significance we must write it down. 5% (only level on for Leaving Cert.)
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
Z=
x ๏€ญ ๏ญ0 497 ๏€ญ 500
=
= ๏€ญ 2.7
๏ณ
10
n
81
Example 1
Step 3. Write down the critical values. ๏€จa sketch also helps๏€ฉ.
Reject
Fail to
Reject
Fail to
Reject
Reject
โˆ’ 2.7 is in the Reject Region
Step 4. Reject H0 if Z is in the critical regions, otherwise fail to reject H0 .
We reject the null hypotheses as ๏€ญ 2.7 is in the reject region.
This means that there is sufficient evidence to conclude that the pens are not genuine.
Example 2
A tyre company claims that the mean life of tyres that it produces is 11,000 miles
with a standard deviation of 552 miles. An independant supplier of tyres wants to investigate
the company's claim. A test on a random sample of 36 tyres from the company gave a mean
life of 10,000 miles.
Carry out a hypothesis test using a significance level of 5% to see if there is evidence to support
the company's claim.
Step 1. State the null and alternative hypotheses.
Null Hypotheses H0 : The company produces tyres with a mean life of 11,000 miles. ๏ญ ๏€ฝ 11,000
Alternative Hypothesis H1: The company produces tyres whose mean life is not 11,000 miles. ๏ญ ๏‚น 11,000
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
Z=
x ๏€ญ ๏ญ0 10,000 ๏€ญ 11,000
=
= ๏€ญ 10.87
๏ณ
552
n
36
Example 2
Step 3. Write down the critical values. ๏€จa sketch also helps๏€ฉ.
Reject
Fail to
Reject
Fail to
Reject
Reject
โˆ’ 10.87 is in the Reject Region
Step 4. Reject H0 if Z is in the critical regions, otherwise fail to reject H0 .
We reject the null hypotheses as ๏€ญ 10.87 is in the reject region.
We can conclude that there is evidence to suggest that the company's claim is not true.
Your Turn
Question 1
A neurologist is testing the effect of a drug on response time by injecting 36 rats
with a unit dose of a new drug.
The neurologist measures the response time of each rat to a stimulus.
The neurologist know that the mean response time for rats not injected is 0.75 seconds.
The mean of the 36 injected rats' response time is 0.6 seconds with a standard deviation of 0.2 seconds.
Can you conclude that the drug has an effect on response time?
Question 1: Solution
A neurologist is testing the effect of a drug on response time by injecting 36 rats
with a unit dose of a new drug.
The neurologist measures the response time of each rat to a stimulus.
The neurologist know that the mean response time for rats not injected is 0.75 seconds.
The mean of the 36 injected rats' response time is 0.6 seconds with a standard deviation of 0.2 seconds.
Can you conclude that the drug has an effect on response time?
Step 1. State the null and alternative hypotheses.
Null Hypotheses H0 : The drug has no effect ๏ญ ๏€ฝ 0.75 seconds
Alternative Hypothesis H1 : The drug has an effect ๏ญ ๏‚น 0.75 seconds
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
z๏€ฝ
x ๏€ญ ๏ญ0
0.6 ๏€ญ 0.75
๏€ฝ
๏€ฝ ๏€ญ4.5
๏ณ
0.2
n
36
Note we are approximating ๐œŽ
with ๐‘ , as we donโ€™t know ๐œŽ.
Question 1: Solution
Step 3. Write down the critical values. ๏€จa sketch also helps๏€ฉ.
Reject
Fail to
Reject
Fail to
Reject
Reject
โ€“4.5 is in the Reject Region
Step 4. Reject H0 if Z is in the critical regions,otherwise fail to reject H0 .
We reject the null hypotheses as โ€“4.5 is in the reject region.
We can conclude that there is evidence to suggest that the drug has an effect on reaction time.
Example 2
In an examination taken by a large number of students the mean mark was 51.5 and the
standard deviation was 8.5. In a random sample of 49 students in a particular town,
it was found that among the students in this town the mean mark was 50.
At the 5% level of significance, investigate if there is evidence to conclude that the
students of this town did as well as students in general.
STEP 1. State the null and alternative hypotheses.
Null Hypotheses H0 : The students in this town did as well as all other students ๏ญ ๏€ฝ 51.5.
Alternative Hypothesis H1 : The students in this town did as well as all other students ๏ญ ๏‚น 51.5.
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
Z=
x ๏€ญ ๏ญ0 50 ๏€ญ 51.5
=
= ๏€ญ 1.24
๏ณ
8.5
n
49
Example 2
Step 3. Write down the critical values. ๏€จa sketch also helps๏€ฉ.
Reject
Fail to
Reject
Fail to
Reject
Reject
โˆ’1.24 is in the Fail to Reject Region
Step 4. Reject H0 if Z is in the critical regions,otherwise fail to reject H0 .
We fail to reject the null hypotheses as ๏€ญ 1.24 is in the fail to reject region.
We can conclude that there is evidence to suggest that the students in this town
did equally as well as students in general.
Example 3
The weights of newborn babies in Ireland is known to have a mean of 3 ๏ƒ— 42kg
and a standard deviation of 0 ๏ƒ— 9kg. Assuming that the weights are normally distributed,
a random sample of 500 babies whose mothers smoked heavily during pregnancy is taken.
If the mean weight of this sample is 3 ๏ƒ— 28kg, can we conclude at the 5% significance
that heavy smoking of mothers during pregnancy has an effect on the weight of their babies at birth?
STEP 1. State the null and alternative hypotheses.
Null Hypotheses H0 : Heavy smoking during pregnancy by mothers has no effect on the weight
of their babies at birth ๏ญ ๏€ฝ 3.42 kg
Alternative Hypothesis H1 : Heavy smoking during pregnancy by mothers has an effect on the weight
of their babies at birth ๏ญ ๏‚น 3.42 kg
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
Z=
x ๏€ญ ๏ญ0 3.28 ๏€ญ 3.42
=
= ๏€ญ 3.48
๏ณ
0.9
n
500
Example 3
Step 3. Write down the critical values. ๏€จa sketch also helps๏€ฉ.
Reject
Fail to
Reject
Fail to
Reject
Reject
โˆ’3.48 is in the Reject Region
Step 4. Reject H0 if Z is in the critical regions,otherwise fail to reject H0 .
We reject the null hypotheses as ๏€ญ 3.48 is in the reject region.
We can conclude that there is evidence to suggest that babies weights will be
effected if their mothers smoke heavily during pregnancy.
p - value
Instead of comparing the value of our test statistic to the critical values, we can get a specific p-value
for our test statistic by looking up its value on the tables.
The p-value measures the strength of the evidence in the data against the null hypothesis.
The smaller the p-value, the less likely it is that the sample results come from a situation
where the null hypothesis is true.
p-value at the 5% Significance Level
If p ๏‚ฃ 0.05: Very strong evidence to reject the null hypotenuse H0 (if p is low H0 must go)
If p ๏€พ 0.05: Very strong evidence to fail to reject the null hypotenuse H0 .
Example 1
Medical consultants for large companies are concerned about the effects of stress
on company executives. The mean systolic blood pressure for males aged
35 to 44 years of age is, according to national health statistics, 128 with
a standard deviation of 15. A sample of 72 male executives in this age
group ws selected from companies. Their mean blood pressure was 130.
(i) Construct a 95% confidence interval for the mean systolic blood pressure
for the executives. Interpert this interval.
(ii) Carry out a hypothesis test using a significance level of 5% to see if there
is evidence to suggest that the mean systolic blood pressure for executives
is different to the national average. Clearly state the null and alternative
hypothesis and your conclusion. Give a p-value for this hypothesis test
and interpret this p-value.
(i)
n ๏€ฝ 72, ๏ณ ๏€ฝ 15, x ๏€ฝ 130
๏ƒฆ ๏ณ ๏ƒถ
95% confidence interval x ๏‚ฑ 1.96 ๏ƒง
๏ƒท
๏ƒจ n๏ƒธ
๏ƒฆ 15 ๏ƒถ
95% confidence interval 130 ๏‚ฑ 1.96 ๏ƒง
๏ƒท
๏ƒจ 72 ๏ƒธ
130 ๏‚ฑ 3.46
[126.54, 133.46]
This means that the mean systolic blood pressure (๏ญ) for all male executives aged 35 to 44
in large companies lies in the range 126.54 to 133.46, with 95% certainty.
This range includes the national average of 128.
(ii) Carry out a hypothesis test.
STEP 1. State the null and alternative hypotheses.
Null Hypotheses H0 : The mean systolic blood pressure for males in the age group 35-44
is the same as the national average. ๏ญ ๏€ฝ 128.
Alternative Hypothesis H1: The mean systolic blood pressure for males in the age group 35-44
is not the same as the national average. ๏ญ ๏‚น 128
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
Z=
x ๏€ญ ๏ญ0 130 ๏€ญ 128
=
= 1.13
๏ณ
15
n
72
Step 3. Write down the critical values. ๏€จa sketch also helps๏€ฉ.
Reject
Fail to
Reject
Fail to
Reject
Reject
1.13 is in the fail to Reject Region
Step 4. Reject H0 if Z is in the critical regions,otherwise fail to reject H0 .
We fail to reject the null hypotheses as 1.13 is in the fail to reject region.
We can conclude that there is evidence to suggest that
the mean systolic blood pressure for males in the age group 35-44
is the same as the national average. ๏ญ ๏€ฝ 128.
Step 5. p ๏€ญ value in a Two Tailed Test.
The probability of getting a value > 1.13 is got from the tables is 1 ๏€ญ 0.8708 ๏€ฝ 0.1292.
The probability of getting a value < ๏€ญ 1.13 is also ๏€ฝ 0.1292.
The p-value is the sum of these two probabilities ๏€ฝ 2(0.1292) ๏€ฝ 0.2584
This p-value is very high it is greater than 0.05
so this is greater evidence for failing to reject the null hypothesis.
Two things to note:
1. The p-value means: what is the probability that the observed value
(130) is this far away from the value I expected to get (128) because
of sheer randomness? So a p-value of 0·26 means in this case that
there is a 26% chance that the blood pressure will be 2 or more units
(130โ€“128 = 2) away from the population mean for a sample of this
size, just because of random variation in sampling. This is not enough
evidence to reject the null hypothesis โ€“ the 5% level of significance
means that we only reject the null hypothesis if the probability that
the observed value is this far away from the value I expected to get
because of sheer randomness is less than 5%. So, at 26%, the chance
that this variation was due to randomness is too high.
2. The z-score is doubled to get the p-value because we are doing a
two-tailed test.
Example 2
A new diet is adertised with the claim that participants will loose an average of 4 kg during the
first week on this diet. A random sample of 40 people on this diet showed a mean weight loss
of 3.6 kg, with a standard deviation of 1 kg.
(i) Calculate at a 95% confidence interval for the mean weight loss of all participants on this diet.
Interpret this interval.
(ii) Test the claim made in the advertisement for this diet at a 5% level of significance.
Clearly state your null and alternative hypotheses and your conclusion.
Give a p-value for this hypothesis test and interpret this p-value.
(i)
n ๏€ฝ 40, s ๏€ฝ 1, x ๏€ฝ 3.5
๏ƒฆ s ๏ƒถ
95% confidence interval x ๏‚ฑ 1.96 ๏ƒง
๏ƒท
๏ƒจ n๏ƒธ
๏ƒฆ 1 ๏ƒถ
95% confidence interval 3.6 ๏‚ฑ 1.96 ๏ƒง
๏ƒท
๏ƒจ 40 ๏ƒธ
3.6 ๏‚ฑ 0.31
[3.29, 3.91]
This means that the mean weight loss (๏ญ) lies in the range 3.29 kg to 3.91 kg, with 95% certainty.
This range does not include the weight loss (4 kg) as advertised.
Example 2
(ii) Carry out a hypothesis test.
STEP 1. State the null and alternative hypotheses.
Null Hypotheses H0 : The average weight loss during the first week of this diet is 4 kg. ๏ญ ๏€ฝ 4 kg.
Alternative Hypothesis H1 : The average weight loss during the first week of this diet is not 4 kg. ๏ญ ๏‚น 4 kg.
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
Z=
x ๏€ญ ๏ญ0 3.6 ๏€ญ 4
=
= ๏€ญ 2.53
s
1
n
40
Example 2
Step 3. Write down the critical values. ๏€จa sketch also helps๏€ฉ.
Reject
Fail to
Reject
Fail to
Reject
Reject
โˆ’2.53 is in the Reject Region
Step 4. Reject H0 if Z is in the critical regions,otherwise fail to reject H0 .
We reject the null hypothesis as ๏€ญ 2.53 is in the reject region.
The average weight loss during the first week of this diet is not 4 kg.
We can conclude that there is evidence to suggest that
the advertising claims seems not to be true.
Example 2
Step 5. p ๏€ญ value in a Two Tailed Test.
The probability of getting a value > 2.53 is got from the tables is 1 ๏€ญ 0.9943 ๏€ฝ 0.006.
The probability of getting a value < ๏€ญ 2.53 is also ๏€ฝ 0.006.
The p-value is the sum of these two probabilities ๏€ฝ 2(0.006) ๏€ฝ 0.012
โ€œThe p-value is very small โ€“ there is only a 1.2% chance that the deviation
from the 4 kg stated is due to sampling variability. This is very strong
evidence for rejecting the companyโ€™s claim.โ€
Your Turn
Question 1
The mean hourly wage in an EU country is โ‚ฌ10. A sample of 35 individuals in the capital city
of the country has a mean hourly wage of โ‚ฌ10.83 with a standard deviation of โ‚ฌ3.35 per hour.
(i) Construct a 95% confidence interval for the mean hourly wage in the capital city.
Interpert this interval.
(ii) Is there evidence to suggest that hourly wages for workers in the capital city are
differen from the national hourly wage?
Test the hypothesis using a 5% level of significance.
Clearly state the null and alternative hypotheses and your conclusion.
Give a p-value for this hypothesis test and interpret this p-value.
Question 1: Solution
The mean hourly wage in an EU country is โ‚ฌ10. A sample of 35 individuals in the capital city
of the country has a mean hourly wage of โ‚ฌ10.83 with a standard deviation of โ‚ฌ3.35 per hour.
(i) Construct a 95% confidence interval for the mean hourly wage in the capital city.
Interpert this interval.
(ii) Is there evidence to suggest that hourly wages for workers in the capital city are
differen from the national hourly wage?
Test the hypothesis using a 5% level of significance.
Clearly state the null and alternative hypotheses and your conclusion.
Give a p-value for this hypothesis test and interpret this p-value.
(i)
n ๏€ฝ 35, s ๏€ฝ 3.35, x ๏€ฝ 10.83
๏ƒฆ s ๏ƒถ
95% confidence interval x ๏‚ฑ 1.96 ๏ƒง
๏ƒท
๏ƒจ n๏ƒธ
๏ƒฆ 3.35 ๏ƒถ
95% confidence interval 10.83 ๏‚ฑ 1.96 ๏ƒง
๏ƒท
๏ƒจ 35 ๏ƒธ
10.83 ๏‚ฑ 1.11
[9.72, 11.94]
This means hourly wage (๏ญ) for workers in the capital city lies in the range โ‚ฌ9.72 to โ‚ฌ11.94
with 95% certainty.
This range includes the mean hourly rate for the country (โ‚ฌ10).
Question 1: Solution
(ii) Carry out a hypothesis test.
Step 1. State the null and alternative hypotheses.
Null Hypotheses H0 : The average hourly wage for a worker in the capital is the same as that of a worker
in the rest of the country . ๏ญ ๏€ฝ โ‚ฌ10.
Alternative Hypothesis H1: The average hourly wage for a worker in the capital is not the same as that
of a worker in the rest of the country . ๏ญ ๏‚น โ‚ฌ10.
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
Z=
x ๏€ญ ๏ญ 0 10.83 ๏€ญ 10
=
= 1.466
s
3.35
n
35
Question 1: Solution
Step 3. Write down the critical values. ๏€จa sketch also helps๏€ฉ.
Reject
Fail to
Reject
Fail to
Reject
Reject
1.466 is in the Fail to Reject Region
Step 4. Reject H0 if Z is in the critical regions,otherwise fail to reject H0 .
We fail to reject the null hypotheses as 1.466 is in the fail to reject region.
We can conclude that there is evidence to suggest that,
the hourly wage for workers in the capital is the same as the rest of the country.
Question 1: Solution
Step 5. p ๏€ญ value in a Two Tailed Test.
The probability of getting a value > 1.466 is got from the tables is 1 ๏€ญ 0.9286 ๏€ฝ 0.0714
The probability of getting a value < 1.466 is also ๏€ฝ 0.0714.
The p-value is the sum of these two probabilities ๏€ฝ 2(0.0714) ๏€ฝ 0.1428
This p-value is greater than 0.05.
So this is greater evidence for failing to reject the null hypothesis.
Question 2
A machine filling bottles of natural mineral water is set to deliver 0.725 litres with a
standard deviation of 0.01 litres. A sample of 50 bottles is checked and the mean quantity
is found to be 0.721 litres.
A the 5% level of siginificance, investigate if there is evidence to suggest that the mean
of this sample is different from the expected mean of 0.725 litres?
Question 2: Solution
A machine filling bottles of natural mineral water is set to deliver 0.725 litres with a
standard deviation of 0.01 litres. A sample of 50 bottles is checked and the mean quantity
is found to be 0.721 litres.
A the 5% level of siginificance, investigate if there is evidence to suggest that the mean
of this sample is different from the expected mean of 0.725 litres?
STEP 1. State the null and alternative hypotheses.
Null Hypotheses H0 : The mean volume delivered is the same as the expected volume.
๏ญ ๏€ฝ 0.725 litres
Alternative Hypothesis H1 : The mean volume delivered is not the same as the expected volume.
๏ญ ๏‚น 0.725 liters
Step 2. Convert the observed results into z units. ๏€จ Calculate the test statistic ๏€ฉ .
Z=
x ๏€ญ ๏ญ0 0.721 ๏€ญ 0.725
=
= ๏€ญ 2.83
s
0.01
n
50
Question 2: Solution
Step 3. Write down the critical values. ๏€จa sketch also helps๏€ฉ.
Reject
Fail to
Reject
Fail to
Reject
Reject
โˆ’ 2.83 is in the Reject Region
Step 4. Reject H0 if Z is in the critical regions,otherwise fail to reject H0 .
We reject the null hypotheses as ๏€ญ 2.83 is in the reject region.
We can conclude that there is evidence to suggest that,
the mean volume delivered is not the same as the expected volume.
We can conclude that there is evidence to suggest that the mean is
different from the expected mean