Transcript 10 C 7

Goals
To perform the Sign Test.
To evaluate and interpret the
p-value for the Sign Test.
In appropriate cases to apply the
normal approximation.
Practical
Perform a series of Sign Tests.
2.11
3:09 AMFriday, 08 April 2016
Remember
Read this weeks, and more importantly, last weeks
lectures in advance.
This weeks links to hypothesis and p-values.
Typically all practicals will include examples with
both small and large “p” values.
Can attend additional tutorials if desired.
Can now start Assignment 1.
2.22
3:09 AMFriday, 08 April 2016
Using Statistics To Make
Inferences 2
Summary
Review the Binomial Distribution
Introduce the Sign Test. The Sign test answers
the question “How Often?”, whereas other tests
answer the question “How Much?”.
Use the Sign Test to assess the Median
2.33
Using Statistics To Make
Inferences 2
I will often discuss how to perform
calculations manually. This will lead to
a
better
understanding
of
the
mechanics of the test which a
computer analysis alone will not
clarify.
Of
course
appropriate
software, if available (SPSS not
Excel), will save time and should avoid
errors.
2.44
Factorial
We need some notation.
2.55
Factorial
The number 5!, read “five factorial”, or “factorial
five”, is a shorthand notation for the expression
5 × 4 × 3 × 2 × 1 = 120
Note that 1! equals 1, and by convention 0! is
defined as 1 also.
In general
1! = 1
0! = 1
n! = n × (n - 1) × (n - 2) ... 5 × 4 × 3 × 2 × 1
2.66
Essentially how many ways can n objects be arranged.
Combination
This is the number of ways we may
achieve m successes from n trials,
ignoring the order in which they
occur, written nCm (alternate
n!
notation  n  ) it is
 m
m!(n  m)!
Note that many terms will cancel
when performing the division.
C - Combination
2.77
Example 1
Consider
10C6
Now 10! = 10x9x8x7x6x5x4x3x2x1 = 3,628,800
6! = 6x5x4x3x2x1 = 720
And (10-6)! = 4! = 4x3x2x1 = 24
Finally
10C6
= 10!/(6!x4!) = 3,628,800/(720 x 24) = 210
A very naïve approach.
In Excel use =COMBIN(10,6)
2.88
Example 1
Consider
10C6
10!
10!
10! 10

 !10! 10!
10 C
6
6!C
(10
10 C6 10
6  6)!  6!4!
6!(10 66!)!
4! 6!4!
(10 6!6)!
10  9  8  7  6  5  4  3  2  1
 10  9  8  7  6  5  4  3  2  1
 6  5  4  3  2  14  3  2  1
6  5  4  3  2 14  3  2 1
10  9  8  7

 10  3  7  210
4  3  2 1
2.99
Binomial Distribution
1
There are a fixed number of trials, n.
2
Each trial only has two possible outcomes.
3
The outcomes are mutually exclusive.
4
The outcomes are independent.
5
The probability of the outcome remains
fixed across the trials, p.
Mean np
Variance np(1-p)
2.10
10
Binomial Distribution
Typical examples arise in game
play. Tossing a coin (n tosses
p=0.5 for head or tail). Rolling a
die (n rolls p=⅙ for a given
number.
2.11
11
Binomial Distribution - Notation
p probability of success
q = 1 – p probability of failure
n trials
m successes
n-m failures
nCm
is the number of ways the m successes may occur
n!
m!(n  m)!
The binomial probability of m successes from n trials
is Prob(m;n,p) (the probability of m given n and p)
n Cm p
m nm
q
2.12
12
Notation For n=1
Prob(0;n,p)=1-p
Prob(1;n,p)=p
no success
one success
2.13
13
Notation For n=2
Prob(0;n,p)=(1-p)2
Prob(1;n,p)=2p(1-p)
Prob(2;n,p)=p2
no success in either trial
one success in either first or
second trial
success in both trials
2.14
14
Notation For n=3
Prob(0;n,p)=(1-p)3
Prob(1;n,p)=3p(1-p)2
Prob(2;n,p)=3p2(1-p)
Prob(3;n,p)=p3
no success in either trial
one success and two failures
in any order
((s,f,f), (f,s,f), (f,f,s))
two successes and one failure
in any order
((f,s,s), (s,f,s), (s,s,f))
three successes
2.15
15
Binomial Distribution
Summary
Prob(m; n, p) n Cm p (1  p)
m
nm
n!

p m (1  p) nm
m!(n  m)!
n! 1 2  3  ...  n
0! = 1
1! = 1
2! = 1 x 2 = 2
3! = 1 x 2 x 3 = 6
4! = 1 x 2 x 3 x 4 = 24
Note that nCm is sometimes written as
n
 
 m
2.16
16
Pascals Triangle
An Aid To Calculation
1
1
1
1
1
1
1
1
1
1
1
8
9
10
15
70
1
6
21
56
126
252
1
5
35
126
210
4
20
56
1
10
35
84
120
6
15
28
1
3
10
21
36
45
3
5
7
2
4
6
1
7
28
84
210
1
1
8
36
120
1
9
45
1
10
1
Each entry is the sum of the values diagonally above it.
Find the next row.
2.17
17
Pascals Triangle
An Aid To Calculation
Notational form of Pascals triangle, an overlay for the
previous triangle. A 3D diagram!
nCr
= nCn-r
Note the symmetry
2.18
18
Pascals Triangle
For example from the final row
10C0
10C1
10C2
10C3
10C4
10C5
1
10
45
120
210
252
10C10
10C9
10C8
10C7
10C6
10C5
1
10
45
120
210
252
Note that we have symmetry 10C3 = 10C10-3 =
10C7
2.19
19
Using The Binomial To Carry
Out Hypothesis Tests
Classical Example – Coin Tossing
10 trials results in 7 heads
H0, is that the probability
getting a head on any trial is ½
of
2.20
20
Binomial
Since there were 10 trials the
probability of 7 or more heads is
Prob (x≥7)
= Prob(7;10,½) + Prob(8;10,½)
+ Prob(9;10,½) + Prob(10;10,½)
2.21
21
Coefficients from Pascals
Triangle
1
10C0
10
10C1
45 120 210 252 210 120 45
10C2 10C3 10C4 10C5 10C6 10C7 10C8
10
10C9
1
10C10
Prob (x≥7)
= Prob(7;10,½)+ Prob(8;10,½) + Prob(9;10,½) +
Prob(10;10,½)
= 10C7 (½)10+10C8 (½)10+10C9 (½)10+10C10 (½)10
= (10C7+10C8+10C9+10C10) (½)10
= (120 + 45 + 10 + 1) (½)10
= 0.1719
2.22
22
Conclusion
So the
0.1719.
required
probability
is
2.23
23
Accepting Or Rejecting The
Null Hypothesis - p-Value
p-value
p>0.1
0.1>p>0.05
0.05>p>0.01
Significance
Not significant at the 10% Accept H0
level
Significant at the 10%
level, but not at 5%
Accept H0,
investigate further
Significant at the 5% level
Reject H0
0.01>p>0.001 Significant at the 1% level
0.001>p
Decision
Significant beyond the
0.1% level
Reject H0, with some
confidence
Reject H0, with
reasonable certainty
2.24
24
Conclusion
The required probability is 0.1719.
Since the p-value exceeds 0.05 the
null hypothesis, H0, is accepted at
the 95% level. The probability of
getting a head on any trial would
appear to be ½, the coin is
apparently fair.
2.25
25
Check in SPSS
Start SPSS and put a value in the first column of the
data sheet. The only reason you enter this is because
SPSS won’t let you use the Transform command
unless you have data.
Transform > Compute Variable
Select a name for the target variable, say, p6.
2.26
26
Check in SPSS
Want 7
or more
heads
p=.5
n=10
Calculate
6 or less
heads
Finally
OK
2.27
27
Check in SPSS
Now go to the Data Editor. You will see a new
column (p6) with the probability in it. Since you need
to see more than two decimal places, go to the
Variable View and increase the number of decimal
places for the probability column.
2.28
28
Check in SPSS
Go back to the data view and see the computed
probability, 0.828125 in this case.
Prob (x≤6) = 0.8281
Prob (x≥7) = 1 - Prob (x≤6) =1 - 0.8281 = 0.1719
2.29
29
Example 2
Given the data set
Sample
1
2
A
26
25
38
33
42
40
44
26
43
35
B
44
30
34
47
35
46
35
47
48
34
3
4
5
6
7
8
9
Is there a significant difference
between subsets A and B taken
from samples 1,…,10?
First calculate the differences.
10
2.30
30
Example 2
Given the data set
Sample
1
2
A
26
25
38
33
42
40
44
26
43
35
B
44
30
34
47
35
46
35
47
48
34
-5
4
-14
7
-6
9
-21
-5
1
Difference -18
3
4
5
6
7
8
9
10
The calculated differences.
Now count the signs.
2.31
31
Analysis
There are 4 plus signs, 6 minus, by
direct calculation.
Prob  x  6 
10 C6 p 1  p  10 C7 p 1  p  10 C8 p 1  p 
6
4
7
3
8
2
10 C9 p 1  p  10 C10 p 1  p 
9
1
10
0
Simplest to work with the larger total
so choose 6 (rather than 4).
So consider 6 out of 10.
2.32
32
Analysis
Since p = q = ½
10
1

Prob  x  6    10 C6 10 C7 10 C8 10 C9 10 C10  
 2
10
1

 210  120  45  10  1   .3770
 2
Coefficients from Pascals Triangle
1
10C0
10
10C1
45 120 210 252 210 120
10C2 10C3 10C4 10C5 10C6 10C7
45
10C8
10
10C9
1
10C10
2.33
33
SPSS Check
P5 = CDF.BINOM(5,10,0.5)
Returns the value 0.6230
That is Prob(x≤5) = 0.6230
And Prob(x≥6) = 1- Prob(x≤5)
= 1 – 0.6230 = 0.3770
2.34
34
Alternate SPSS Check
Analyze
Nonparametric Tests
Legacy Dialogs
2 Related Samples
2.35
35
Alternate SPSS Check
Note that two
variables must
be selected
Note that the Sign option has been ticked and
that Exact was also selected.
2.36
36
Alternate SPSS Check
Test Statisticsb
Exact Sig. (2-tailed)
Exact Sig. (1-tailed)
Point Probability
B- A
.754a
.377
.205
a. Binomial distribution us ed.
b. Sign Test
The output summarises the data and then
produces the expected test statistic.
2.37
37
Conclusion
Since the choice of considering the
plus's was arbitrary the overall
probability is 0.754 (2 x 0.3770).
2.38
38
Accepting Or Rejecting The
Null Hypothesis - p-Value
p-value
p>0.1
0.1>p>0.05
0.05>p>0.01
Significance
Not significant at the 10% Accept H0
level
Significant at the 10%
level, but not at 5%
Accept H0,
investigate further
Significant at the 5% level
Reject H0
0.01>p>0.001 Significant at the 1% level
0.001>p
Decision
Significant beyond the
0.1% level
Reject H0, with some
confidence
Reject H0, with
reasonable certainty
2.39
39
Conclusion
Since the choice of considering the
plus's was arbitrary the overall
probability is 0.754.
There is no significant difference
between the medians of the two
distributions (p > 0.05).
2.40
40
Example 3
It is claimed that students born in the first
quarter of the academic year have an
advantage over those born at other times. In a
class of 100 students 31 were born in this
period.
How strong is the evidence for the predicted
effect?
Under H0 there should be no relationship
between birth date and tendency to go on to
higher education.
2.41
41
Calculation
The probability of a student having a
birthday in this period is ¼. There are
x=31 successes in the sample of n=100.
p30 = CDF.BINOM(30,100,.25)
Returns the value 0.8962
Prob (x≤30) = 0.8962
Prob (x≥31) = 1 - Prob (x≤30) =1 - 0.8962 = 0.1038
2.42
42
Approximation
Since n is large use a normal approximation,
mean np = 100 x ¼ = 25 = μ
variance np(1-p) = 100 x ¼ x ¾ = 18.75 = σ2
x = 31
z
x

31  .5  25 

Prob ( Bin  31)  Prob  z 

18.75 

Continuity correction
2.43
43
Normal/Binomial Fit
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0
10
20
30
40
50
60
70
80
90
Using μ and σ2 from the previous slide for
the normal curve.
100
2.44
44
Continuity Correction
Want ≥ 31
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
25
26
27
28
29
30
31
32
33
34
35
2.45
45
Continuity Correction ≥ 31
Cut at
31?
2.46
46
Continuity Correction ≥ 31
Cut at
30?
2.47
47
Continuity Correction ≥ 31
Cut at
30.5?
2.48
48
Normal Approximation
The approximation generally improves as n increases and
is better when p is not near to 0 or 1. Various rules of
thumb may be used to decide whether n is large enough,
and p is far enough from the extremes of zero or unity.
One rule is that both np and n(1 − p) must be greater
than 5. However, the specific number (n) varies from
source to source, and depends on how good an
approximation one wants; some sources give 10.
Box, Hunter and Hunter. Statistics for Experimenters.
Wiley. p. 53.
2.49
49
Normal Table
31  .5  25 

Prob ( Bin  31)  Prob  z 

18.75 

 Prob ( z  1.27)  .1020
Z
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
-0.08
-0.09
-1.2
0.115
0.113
0.111
0.109
0.107
0.106
0.104
0.102
0.100
0.099
The exact Binomial probability is 0.1038.
2.50
50
Conclusion
Since the p-value exceeds 0.05 the
null hypothesis, H0, is accepted.
There is no relationship between
birth date and entry to higher
education.
Skip to
end of
lecture
2.51
51
The Sign Test - Example
Under H0 subjects will be equally
likely to balance a rod for longer
whilst speaking or silent.
50 subjects were tested of which
29 balanced for longer while silent.
2.52
52
Normal Approximation
mean np = 50 × ½ = 25
variance np(1-p) = 50 × ½ × ½ = 12.5
x = 29
29  .5  25 

Prob ( Bin  29)  Prob  z 
  Prob ( z  0.99)  .161
12.5 

Z
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
-0.08
-0.09
-0.9
0.184
0.181
0.179
0.176
0.174
0.171
0.169
0.166
0.164
0.161
p-value is 2 x 0.161 = 0.322 (exact Binomial value .3222)
Times 2 since we are seeking a difference (not a positive
or negative difference)
2.53
53
Conclusion
Prob (x≥29) = 1 - Prob (x≤28) =1 - 0.8389 = 0.1611
p-value is 2 x 0.1611 = 0.3222
Times 2 since we are seeking a
difference (not a positive or negative
difference).
Since the p-value exceeds 0.05 the null
hypothesis, H0, is accepted. There is no
relationship
between
speaking
and
balancing.
Skip to
end of
lecture
2.54
54
Median Test
The Binomial distribution may also
be employed to test whether the
median of a population takes a value
(M). This is called the Median
Test. The null hypothesis is that
the median is M, so half the
observations should lie below M and
half above.
2.55
55
Example 4
The median reaction time is 0.25
seconds.
The subject is
taking alcohol.
retested
after
In 10 trials, 8 cases exhibited
greater reaction times.
2.56
56
Analysis
The null hypothesis, H0, of no effect, is
that reaction times should not be longer
after taking alcohol (10 trials).
Prob(8 or more) = Prob(8) + Prob(9) + Prob(10)
= (45+10+1)(½)10
= 0.0547
1
10C0
10
10C1
45 120 210 252 210 120
10C2 10C3 10C4 10C5 10C6 10C7
45
10C8
10
10C9
1
10C10
2.57
57
Conclusion
The required probability is 0.0547.
A one tailed test is appropriate
(seeking a longer reaction time),
since the p-value is greater than
0.05
the
null
hypothesis
is
accepted. Alcohol appears to have
no effect on reaction time.
Skip to
end of
lecture
2.58
58
Example 5
Example values
determined.
for
29
items
were
0 50 56 72 80 80 80 99 101 110 110
110 120 140 144 145 150 180 201 210
220 240 290 309 320 325 400 500 507
Records indicate the population median
for similar items was 115. This test will
determine if there is sufficient evidence
for judging if the median for the items
59
was greater than 115 using a = 0.05. 2.59
Analysis
Choose Analyze > Nonparametric tests
> Legacy Dialogs > Binomial
Under Define Dichotomy in the lower left hand corner
choose Cut Point and specify the null value, 115 in this
case. Finally select OK.
2.60
60
Analysis
2.61
61
Analysis
Bi nom ial Test
V1
Group 1
Group 2
Total
Category
<= 115
> 115
N
12
17
29
Observed
Prop.
.41
.59
1.00
Test Prop.
.50
As ymp. Sig.
(2-tailed)
.458a
a. Based on Z Approx imat ion.
For a one sided test divide the p value by 2, giving
0.458/2 = 0.229.
2.62
62
Normal Approximation
mean np = 29 × ½ = 14.5
variance np(1-p) = 29 × ½ × ½ = 7.25
x = 17
17  .5  14.5 

Prob( Bin  17)  Prob z 
  Prob( z  0.74)  0.23
7.25 

ZZ
Z
0.00 -0.01 -0.02 -0.03 -0.04 -0.05 -0.06 -0.07 -0.08 -0.09
-0.7 0.242 0.239 0.236 0.233 0.230 0.227 0.224 0.221 0.218 0.215
Binomial p-value is 0.23
2.63
63
Conclusion
Of the 29 items, 12 are below and 17 are
above the hypothesis value, 115. Because
an upper one-sided test was chosen, the
p-value is the binomial probability of
observing 17 or more observations greater
than 115 if p is ½. If your a level was
less than .05, p-value of 0.2291, you
would fail to conclude that the population
median was greater than 115, which seems
likely for most situations.
2.64
64
Conclusion
If you had performed a two-sided test
using the same sample (H0: median = 115
versus H1 median ≠ 115), there would be
12 observations below 115 and 17 above.
Since you would be performing a twosided test, you would look at the number
of observations below and above 115,
and take the larger of these, 17.
2.65
65
Conclusion
The
binomial
probability
of
observing this many observations or
more is 0.2291, and the p-value of
the two-sided test is twice this
value, or 2 × 0.2291 = 0.4582.
If n had been > 50, use a normal
approximation to the binomial in
calculating the p-value.
Skip to
end of
lecture
2.66
66
Read
Read Howitt and Cramer 170-172
Read Russo (e-text) pages 75-77
2.67
67
Practical 2
This material is available from the
module web page.
http://www.staff.ncl.ac.uk/mike.cox
Module Web Page
2.68
68
Practical 2
This material for the practical is
available.
Instructions for the practical
Practical 2
Material for the practical
Practical 2
2.69
69
Tables
Always check in advance which tables
are necessary for lectures in successive
weeks. These are generally included with
the print version of the lecture notes.
2.70
70
Assignment 1
You will find submission details on the
module web site
Note the dialers lower down the
page give access to your individual
assignment. It is necessary to enter
your student number exactly as it
appears on your smart card.
2.71
71
Assignment 1
As a general rule make sure you can
also perform the calculations manually.
They might appear in an examination
question!
It is a good idea to check your
calculations using a software package.
Some software (for example Excel)
employ non-standard definitions and
should be used with caution.
2.72
72
Assignment 1
All submissions must be typed.
If desired additional material (which
can be hand written) may be attached as
an appendix to the formal submission.
Return to start
2.73
73
Whoops!
Well, fairer in the sense that everyone's got an
equal chance, but not fair in the sense that, well, it
is literally a lottery.
BBC Radio 4 Today
4 March 2008
2.74
74
Whoops!
Imagine we are throwing a five-sided die 50 times.
On average, out of these 50 throws how many times
would this five-sided die show an odd number (1, 3
or 5)? ______ out of 50 throws.
Berlin Numeracy Test
Measuring Risk Literacy: The Berlin Numeracy
Test
by Edward T. Cokely et al., Judgment and Decision
Making, January 2012
2.75
75
Hint
Platonic solids
Tetrahedron
Cube
(or Regular
hexahedron)
Octahedron
Dodecahedron
Icosahedron
2.76
76
Moral
Always plot your data
2.77
77