Transcript SEEDSM12_6f
Scientific Methods 1
‘Scientific evaluation, experimental design
& statistical methods’
COMP80131
Lecture 6: Statistical Methods-Significance
Barry & Goran
www.cs.man.ac.uk/~barry/mydocs/myCOMP80131
3 Dec 2012
COMP80131-SEEDSM12_6
1
Continuous random processes
• Characterised by probability density functions (pdf)
pdf(x)
Uniform pdf: Prob of the random variable x
lying between a and b is:
1
b
x
ab
pdf(x)
pdf ( x)dx b a
a
1
Gaussian (Normal) pdf with mean m & std dev .
1
pdf ( x)
e
2
1 x m 2
2
b
Pr ob pdf ( x)dx
a
m-
3 Dec 2012
m
m+
68%
ab
x
COMP80131-SEEDSM12_6
95.5% for m 2
99.7% for m 3
2
pdf & Histograms
•
•
•
•
Ru = rand(10000,1); %10000 unif samples
hist(Ru,20);
Rg=randn(10000,1); %Gaussian with m=0, std=1
hist(Rg,20);
600
1600
1400
500
1200
400
1000
300
800
600
200
400
100
200
0
0
0.1
3 Dec 2012
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
-4
-3
-2
COMP80131-SEEDSM12_6
-1
0
1
2
3
4
5
3
Convert histogram to estimate of pdf
• Divide each column by number of samples
• Then divide by width of bins.
• For better approximation, increase number of bins
3 Dec 2012
COMP80131-SEEDSM12_6
4
MATLAB illustration
Rg = randn(100000,1); %10000 Gaussians with m=0, std=1
widthBin = 0.2;
X = -4 : widthBin : 4 ;
H = hist(Rg,X); % Histogram with bins centred on elements of X
figure(2); bar(X,(H/100000)/widthBin); ylabel('pdf estimate');
0.4
pdf estimate
0.35
0.3
Histogram as pdf
estimate.
0.25
0.2
0.15
0.1
0.05
0 -5
3 Dec 2012
-4
-3
-2
-1
0
1
2
3
COMP80131-SEEDSM12_6
4
5
5
Gaussian (normal) pdf
• Measurements {xi} of many naturally occurring phenomena
tend to be normally distributed with some mean µ & stdev .
• Let zi = (xi - µ)/,
• Then {zi} has standard normal pdf with mean = 0 & std = 1.
• Conversely, if you generate a set of pseudo-random
numbers {zi} with mean = 0 & std = 1, let
xi = (zi) + µ
to scale the mean & std as required.
3 Dec 2012
COMP80131-SEEDSM12_6
6
Plot true standard normal pdf
Mean=0; Std=1;
K = 1/( Std*sqrt(2*pi) );
X = -4*Std : widthBin : 4*Std ;
for I=1:length(X);
G(I) = K * exp(-(X(I)-Mean)^2 / (2*Std^2) );
end;
figure(4); plot(X,G); ylabel('pdf');
pdf ( x)
1
e
2
1 x m 2
2
Gaussian pdf
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0-4
3 Dec 2012
-3
-2
-1
0
1
2
3
x
COMP80131-SEEDSM12_6
4
7
Plot Gaussian cdf
X=-4:0.1:4;
C = normcdf(X,0,1);
figure(1); plot(X,C); grid on;
xlabel('x'); ylabel('prob that var < x');
prob that rand variable < x
1
0.9
Cumulative density
function (cdf)
Probability of Gaussian
variable (m=0 std=1)
being < x.
No formula for this.
0.8
0.7
Use MATLAB function:
normcdf(X,m,std)
0.6
0.5
0.4
0.3
0.2
0.1
0
-4
3 Dec 2012
-3
-2
-1
x
0
1
2
3
4
COMP80131-SEEDSM12_6
8
Complementary Gaussian cdf
1
This is just
1 – normcdf(x,m,)
prob that var > x
0.9
0.8
It is prob of Gaussian
random variable
(mean= m, std=)
being > x.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4
3 Dec 2012
-3
-2
-1
0
x
1
2
3
4
COMP80131-SEEDSM12_6
9
Complementary error function
• Some call the complementary Gaussian cdf (m=0, =1)
the ‘complementary error function’ Q(z)
• But ‘erfc’ is also called this.
• Q(z) = comp-Gaussian cdf = 0.5 erfc(-z/2).
• Used to rely on tables & graphs of Q(z).
• When m0 & 1, use Q((z-m)/)
3 Dec 2012
COMP80131-SEEDSM12_6
10
3 Dec 2012
COMP80131-SEEDSM12_6
11
Use of ‘normcdf’ function
0.4
Gaussian pdf
0.35
0.3
0.25
0.2
D
E
0.15
0.1
0.05
0
-4
-3
-2
-1
0
x
1
2
D
Prob of random var being between D & E is:
E
D
3 Dec 2012
3
4
E
pdf ( x)dx normcdf(E, m, ) normcdf(D, m, )
COMP80131-SEEDSM12_6
12
Tail of distribution
0.4
Gaussian pdf
0.35
0.3
0.25
0.2
D
0.15
0.1
0.05
0
-4
D
-3
-2
-1
0
x
1
2
3
4
Prob of random variable being greater than D is:
D
3 Dec 2012
pdf ( x)dx 1 normcdf(D, m, )
COMP80131-SEEDSM12_6
13
An Engineering Question
•
•
•
•
•
•
Rectangular 1v & 0v pulses used to transmit a binary signal.
Affected by additive white Gaussian noise (AWGN).
Mean of noise =0 & power (variance) 2 = 0.01.
Estimate the bit-error probability.
Bit-error may occur if noise adds voltage > 0.5v to 0 v or < -0.5 v to 1v.
Assume same no. of 1’s & 0’s
Voltage
+1
+1/2
t
3 Dec 2012
COMP80131-SEEDSM12_6
14
Solution
prob(error) = prob(noise > 0.5) when bit =0
+ prob(noise < -0.5) when bit =1
= 0.5 prob(noise > 0.5) +0.5 prob(noise < 0.5)
= prob(noise > 0.5) because of symmetry
= 1 - normcdf(0.5, 0, 0.1)
= 2.910-7
Or, using graph Q(z/) on next page,
prob(error) = Q(0.5/) = Q(0.5/0.1) = Q(5) 310-7
3 Dec 2012
COMP80131-SEEDSM12_6
15
Q(z/)
/
3 Dec 2012
COMP80131-SEEDSM12_6
z/
16
Back to sampling
Assume a population has true mean , & stdev .
Take a sample of N measurements from it; say N=50
Calculate sample-mean m1 & stdev s1.
Cannot expect m1 = µ & s1 = , exactly.
Take another sample, & calculate m2 & s2 .
Repeat to obtain m1, m2, …, mM & s1, s2, …, sM
Now have distributions for sample-mean & sample-stdev.
If population is Gaussian, pdf of sample-means will be
Gaussian with mean = & stdev = / N.
• Can confirm by increasing M & estimating mean & stdev
of sample-mean from m1, m2, …, mM
• What about mean & stdev of sample-variances? (later)
•
•
•
•
•
•
•
•
3 Dec 2012
COMP80131-SEEDSM12_6
17
Significance testing
•
•
•
•
•
Assume pop-mean (‘mu’)may change.
Assume we know pop-stdev & that it will not change.
Assume we can only take one sample of 50 values.
Calculate m1 to decide whether µ has changed.
Null Hypothesis – it has not changed.
i.e. new pop-mean New =
• If Null Hyp is true, pdf of sample-mean is on next slide:
3 Dec 2012
COMP80131-SEEDSM12_6
18
Gaussian pdf
pdf of sample-mean
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
s1 = /50
-2s1
-s1
+s1
+2s1
+4s1
m1
• Assume value we got was m1 = + 2.5s1.
E.g. if µ=0 & =1, then m1 = 2.5/50 0.36
• How unlikely if Null Hypothesis is true?
3 Dec 2012
COMP80131-SEEDSM12_6
19
Concept of a ‘null-hypothesis’
• A null-hypothesis is an assumption that is made and
then tested by a set of experiments designed to reveal
that it is likely to be false, if it is false.
• Testing is done by considering how probable the results
are, assuming the null hypothesis is true.
• If the results appear very improbable the researcher may
conclude that the null-hypothesis is likely to be false.
• This is usually the outcome the researcher hopes for
when he or she is trying to prove that a new technique is
likely to have some value.
3 Dec 2012
COMP80131-SEEDSM12_6
20
p-value
• “Probability of obtaining a test result at least as extreme as
the one observed, assuming that null-hypothesis is true”.
• Reject null-hypothesis if the p-value is less than some
value α (significance level) which is often 0.05 or 0.01.
• When null-hypothesis is rejected, result is statistically
significant.
• Here p-value is 1 - normcdf(m1, , s1) …with s1= /N
= 1-normcdf(+2.5s1, , s1)
= 1- normcdf(2.5s1 ,0, s1) = 0.0062
= 1- normcdf(2.5 ,0, 1) = 0.0062
• Much less than 0.01 so reject NH at 1% confidence level.
• Conclude that mean has changed.
3 Dec 2012
COMP80131-SEEDSM12_6
21
Our two assumptions
• That was easy because we made 2 assumptions:
population is Gaussian & pop-stdev is known to us.
• Now need to eliminate these 2 assumptions.
• We have some help from the Central Limit Theorem:
3 Dec 2012
COMP80131-SEEDSM12_6
22
Central Limit Theorem
• If samples of size N are ‘randomly’ chosen from a pop
with mean & std , the pdf of their sample-means, m1,
approaches a Normal (Gaussian) pdf with mean & std
/N as N is made larger & larger.
• Regardless of whether population is Gaussian or not!
• Previous example can be made to work for nonGaussian pop provided N is ‘large enough’.
• More on this next week.
3 Dec 2012
COMP80131-SEEDSM12_6
23
Another example
• Assume we wish to find out if a technique designed to
benefit users of a system is likely to have any value.
• Divide users into two groups & offer proposed technique to
one group, and something different to the other group.
• The null-hypothesis would be that the proposed technique
offers no measurable advantage over the other techniques.
3 Dec 2012
COMP80131-SEEDSM12_6
24
The testing
• Look for differences between the sets of results obtained
for each of the two groups.
• Careful experimental design will try to eliminate
differences not caused by techniques being compared.
• Take a large number of users in each group & randomize
the way the users are assigned to groups.
• Once other differences have been eliminated as far as
possible, remaining difference will hopefully be indicative
of the effectiveness of the techniques being investigated.
• Vital question is whether they are likely to be due to the
advantages of the new technique, or the inevitable
random variations that arise from the other factors.
• Are the differences statistically significant?
• Can employ a statistical significance to find out.
3 Dec 2012
COMP80131-SEEDSM12_6
25
Failure of the experiment
• If results are not found to look improbable under the nullhypothesis,
i.e. if the differences between the two groups are not
statistically significant,
then no conclusion can be made.
• Null-hypothesis could be true, or it could still be false.
• Mistake to conclude that the ‘null-hypothesis’ has been
proved likely to be true in this circumstance.
• It is quite possible that the results of the experiment give
insufficient evidence to make any conclusions at all.
3 Dec 2012
COMP80131-SEEDSM12_6
26
Question: fair coin test
Checking whether a coin is fair
Suppose we obtain heads 14 times out of 20 flips.
The p-value for this test result would be the probability of a
fair coin landing on heads at least 14 times out of 20 flips.
From binomial distribution formula( Lecture 4), this is:
(20C14 + 20C15+20C16+20C17+20C18+20C19+20C20) / 220
= 0.058
This is probability that a fair coin would give a result as
extreme or more extreme than 14 heads out of 20.
3 Dec 2012
COMP80131-SEEDSM12_6
27
Significance test for fair coin question
• Reject null-hypothesis if p-value α .
• If α= 0.05, rejection of null-hypothesis is:
“at the 5% (significance) level”.
• Probability of wrongly rejecting null-hypothesis
(Type 1 error) will be equal to α.
• This is often considered ‘sufficiently low’.
• In our example, p-value = 0.058 > 0.05.
• Observation is consistent with null-hypothesis & we
cannot reject it.
• Cannot conclude that coin is likely to be unfair.
• But we have NOT proved that coin is likely to be fair.
• 14 heads out of 20 flips can be ascribed to chance alone
• It falls within the range of what could happen 95% of the
time with a fair coin.
3 Dec 2012
COMP80131-SEEDSM12_6
28
Questions from Lecture 2
• Analyse the ficticious exam results & comment on features.
• Compute means, stdevs & vars for each subject & histograms for the
distributions.
• Make observations about performance in each subject & overall
• Do marks support the hypothesis that people good at Music are also
good at Maths?
• Do they support the hypothesis that people good at English are also
good at French?
• Do they support the hypothesis that people good at Art are also good
at Maths?
• If you have access to only 50 rows of this data, investigate the same
hypotheses
– What conclusions could you draw, and with what degree of certainty?
3 Dec 2012
COMP80131-SEEDSM12_6
29
Questions from L4
1. A patent goes to a doctor with a bad cough & a fever. The doctor
needs to decide whether he has ‘swine flu’. Let statement S = ‘has
bad cough and fever’ & statement F = ‘has swine flu’. The doctor
consults his medical books and finds that about 40% of patients with
swine-flu have these same symptoms. Assuming that, currently,
about 1% of the population is suffering from swine-flu and that
currently about 5% have bad cough and fever (due to many possible
causes including swine-flu), we can apply Bayes theorem to estimate
the probability of this particular patient having swine-flu.
2. A doctor in another country knows form his text-books that for 40% of
patients with swine-flu, the statement S, ‘has bad cough and fever’ is
true. He sees many patients and comes to believe that the
probability that a patient with ‘bad cough and fever’ actually has
swine-flu is about 0.1 or 10%. If there were reason to believe that,
currently, about 1% of the population have a bad cough and fever,
what percentage of the population is likely to be suffering from swineflu?
3 Dec 2012
COMP80131-SEEDSM12_6
30