SQC_Module3_DD-done

Download Report

Transcript SQC_Module3_DD-done

Statistical Quality Control in Textiles
Module 3:
Statistical Inferences on Quality
Dr. Dipayan Das
Assistant Professor
Dept. of Textile Technology
Indian Institute of Technology Delhi
Phone: +91-11-26591402
E-mail: [email protected]
Terminologies and Definitions
Population & Sample [1]
By population we mean the aggregate or totality of objects or
individuals regarding which inferences are to be made.
The number of objects or individuals present in the population is
known as size of the population.
By sample we mean a collection consisting of a part of the objects or
individuals of a population which is selected as a basis for making
inferences about certain population facts.
The number of objects or individuals present in the sample is known
as size of the sample.
The technique of obtaining a sample is called sampling.
Parameter and Statistic
A parameter is a population fact which depends upon the values of
the individuals comprising the population. For example, mean,
variance, etc. associated with a population are known as population
parameters. They are constants for a given population.
A statistic is a sample fact which depends upon the values of the
individuals comprising the sample. For example, mean, variance, etc.
associated with a sample are known as sample statistics. Of course,
many samples of a given size can be formed from a given population,
accordingly the statistics will vary from sample to sample. So they
are not constants, but are variables.
The difference between the value of a population parameter and the
value of the corresponding statistic for a particular sample is known
as sampling error.
Estimation of Population Parameters
We can calculate the value of statistics for a sample, but, it is
practically impossible to calculate the value of parameters for a
population. Therefore, we often estimate the population parameters
based on the sample statistics.
The two methods of estimation are
 point estimation
 interval estimation
Sampling Distribution
Sampling distribution of a sample statistic is the relative frequency
distribution of a large number of determinations of the value of this
statistic, each determination being based on a separate sample of the
same size and selected independently but by the same sampling
technique from the same population.
Standard Error
The standard error of any statistic is the standard deviation of its
sampling distribution of a sample statistic.
Bias
If the mean of the sampling distribution of a statistic is equal to that
of the corresponding population parameter, then the statistic is said
to be unbiased.
If, on the other hand, the mean of the sampling distribution of a
statistic is not equal to that of the corresponding population
parameter, then the statistic is said to be biased.
Bias may arise from two sources:
(1) Technique of sample selection (troublesome, no way to assess its
magnitude)
(2) Character of the statistic (less troublesome, possible to find out its
magnitude and direction and then make allowance accordingly).
Sampling Technique
Simple Random Sample
Assume a sample of a given size is selected from a given population
in such a way that all possible samples of this size which could be
formed from this population have equal chance of selection, then
such a sample is called simple random sample.
Simple Random Sampling Scheme
Step 1:
Assign some identification numbers to each individuals of
the population.
Step 2:
Take out an individual randomly, using “Random
Number Table”
Step 3:
Repeat Step 2 until you obtain the desired number of
individuals in a sample.
Note:
No two individuals assigning the same random number can be
taken to form this sample.
Random Number Table
This is a huge collection of ten-digit numbers such that the ten digits
not only would occur with equal frequencies but also are arranged
in a random order.
51772
24033
45939
30586
03585
64937
15630
09448
21631
91097
…..
74640
23491
60173
02133
79353
03355
64759
56301
91157
17480
…..
42331
83587
52078
75797
81938
95863
51135
57683
77331
29414
…..
29044
06568
25424
45406
82322
20790
98527
30277
60710
06829
…..
46621
21960
11645
31041
96799
65304
62586
94623
52290
87843
…..
62898
21387
55870
86707
85659
55189
41889
85418
16835
28195
…..
93582
76105
56974
12973
36081
00745
25439
68829
48653
27279
…..
04186
10863
37428
17169
50884
65253
88036
06652
71590
47152
…..
19640
97453
93507
88116
14070
11822
24034
41982
16159
35683
…..
87056
90581
94271
42187
74950
15804
67283
49159
14676
47280
…..
An Assumption
It is practically impossible to numerically identifying each
individual of a population either because of the large size of the
population or because of the inaccessibility or current non-existence
of some of the individuals. In such situations, some of the available
individuals may be used as a sample. When some are used, these
should be selected at random from those available. Although such
samples do not comply with the definition of simple random
sample, they are often treated as such. That is, they are simply
assumed to be random samples of the population involved.
Note: This assumption has been widely used in sampling of textile
materials.
Point Estimation of Population Parameters
Estimation of Population Mean
Suppose from a population, we draw a number of samples each
containing n random variables x1 , x2 , , xn . Then, each sample has
a mean x , which is a random variable. The mean (expected) value
of this variable is


 x1  x2   xn  1
x  Ex  E
 E  x1  x2   xn  

n

 n
x




1
1
1

E  x1   E  x2    E  xn            n  ,
 n
n
n





where  is the population mean. Thus, sample mean x
unbiased estimator of population mean .
is an
Estimation of Population Mean (Contd.)
The variance of the variable x
2
is


 x  x   xn  n  
1
2
sx2  E  x  x   E  x    =E  1 2


E  x1      x2    


2
n
n
n

 



1
2
2
 2 E  x1      x1    x2       x1    xn      x2    x1      x2    
n 
2
   x2    xn    
  xn    x1      xn    x2       xn     

1
1
1
2
2
2
 2  E  xi     2  E  xi     x j      2 n 
,
n i 1,2, , n
n i 1,2, , n
n
n
2

2
j 1,2, , n
i j
  xn     
2
0
where  is the variance of the population. Note that the standard

deviation of the variable x is n , which is known as standard error
of the variable x . Clearly, larger samples give more precise
estimates of the population mean than do smaller samples.
Estimation of Population Variance
Suppose from a large population of mean  and variance  2 , we
draw a number of samples each containing n random variables
x1 , x2 , , xn . Let x be the mean of these samples and s 2 is the
variance of these samples. Clearly, x and s 2 are variables. The
expected value of sample variance is


2
 1 i n
 1 i n
 1 i n
2
2
2 
E  s   E    xi  x    E    xi      x      E    xi     2  xi    x      x     
 n i 1

 n i 1

 n i 1

1 i n
1 i n
 1 i n
 1 i n
 1 i n

 1 in
2
2
2
2
 E    xi      2  xi    x       x      E    xi      E   2  xi    x      E    x     
n i 1
n i 1
 n i 1

 n i 1

 n i 1

 n i 1

2
i n
i n
 1 i n
 1

1

 1 i n
2
2
2
2 
 1

1
 E    xi      E  2  x      xi      E   x    1  E    xi      E  2  x    n  x      E   x    n  
 n

n

i 1
i 1 
 n i 1

 n

n
 n i 1

1  i n
 1 i n
2
2
2
2
2
2
 E    xi      E  2  x      E  x      E    xi      2 E  x      E  x     



 n  i 1




 n i 1


1  i n
1 i n
1 i n 2 2 1 2 2 n  1 2
2
2
2
2




 E    xi      E  x      E  xi     E  x       
 n 



 n i 1

 n i 1
n  i 1
n n
n
n

Estimation of Population Variance (Contd.)
2
Since E  s 2    2 , the estimator s is said to be biased estimate of
the population variance.
n 1 2

Because E  s  
n
 n 2
s  . Further,
we can write that   E 
n

1






2
2
 n n  xi  x  
 n  xi  x  
 n 2
2
2
  E
s E

E

E
S







n

1
n

1
n
n

1


i 1


 i 1





s2
S 2
2
2
 xi  x 
Since E  S 2   2 we can say that S 2  
n 1
i 1
estimate of the population variance.
n
2
is an unbiased
 n 1 2 
 n 1 
2
2
2
Lt
E
s

Lt



Lt


,
Note: In case when n   we get n   n 



n

 n

 n 
1
then we say that s 2 is an unbiased estimate of  2 .
Estimation of Population Standard Deviation
Suppose from a large population of mean  and variance  2 , we
draw a number of samples each containing n random variables
x1 , x2 , , xn . Let x be the mean of these samples and s 2 is the
variance of these samples. Clearly, x and s 2 are variables. The
expected value of sample variance is
E  s2  
n 1 2

n
It is possible to derive that the expected value of sample standard
c
deviation is
n
n
 
2
2 c 
E s 
where  denotes gamma function,
n
n

1
n 


  n   n  1.

2

 s
 s
Then, E    . Thus,
gives an unbiased estimate of  .
cn
 cn 
Estimation of Population Standard Deviation (Contd.)
The variance of s is
ss2  E  s 2    E  s   
2
n 1 2
2
2
 n 1
   cn  2  
  cn    2
n
 n

Hence the standard error of s is
2
 n 1

c


n
 n
 .
Estimation of Difference Between Two Population Means
Suppose we have two independent populations with mean
x
and  y and variances  2x and 2y , respectively. Let N be the
pairs of random samples that are formed from each of the two
populations each having n1 variates from the first and n2 from the
second population. Let the means of these samples be x1 , x2 , , xn
from the first population and y1 , y2 , , yn
from the second. Then,
consider the difference of the means x1  y1 , x2  y2 , , xn  yn . The
mean of the difference of the means is
x  y  E  x  y   E  x   E  y   x  y  x   y
Thus, the difference of the means of two samples is an unbiased
estimator of the difference of the means of two populations.
Estimation of Difference Between Two Population Means
(Contd.)
The
variance
of
the
difference
of
the
sample
means is
2


2
2
2
sx  y  E  x  y   E  x  y    E  x  y    x  y    E  x  x    y  y   


x  y


2
2
2
2
 E  x  x    y  y   2  x  x  y  y    E  x  x   E  y  y   2 E  x  x  E  y  y  


0
 Ex  x  E y  y
2
2
0
2
2x  y
s s 

n1 n2
2
x
2
y
where sx2 and s y2 are the variances of x and y , respectively.
Note that the standard deviation of the variable x  y is sx  y ,
which is known as standard error of the variable x  y.
Estimation of Population Proportion
Assume a population consists of “good” items and “bad” items. Let
the proportion of good items in this population be p. Hence, the
proportion of bad items in this population is 1-p. Let us draw n
items from this population such that it resembles a series of n
independent Bernoulli trials with constant probability p of
selection of good items in each trial. Then the probability of
selection of x good items in n trials, as given by binomial
n x
distribution, is nCx p x 1  p  , where x  0,1, 2,..., n.
Then,
the
mean (expected) number of good items in a n trials is np and the
standard deviation of number of good items in n trials is np 1  p .
Suppose we draw n items from this population to form a sample.
Assume x be the number of good items in this sample. Hence, the
proportion of good items in this sample is p  x n .
The mean
(expected) proportion of good items in this sample is
Estimation of Population Proportion (Contd.)
1
 x 1
E  p   E    E  x   np  p.
n
 n  n np
Hence, the sample proportion is an unbiased estimator of
population proportion.
The variance of sample proportion is given by
2
x
 x 
x 1



E  p  E  p    E   E     E   E  x   
 n 
n n

n
p 1  p 
2
1
1
 2 E  x  E  x    2 np 1  p  
n
n
n
2
2
 np 1 p 
Then, the standard deviation of sample proportion is
This is known as standard error of sample population.
p 1  p 
.
n
Interval Estimation of Population Parameters
Probability Distribution of Sample Means
If the population from which samples are taken is normally
distributed with mean  and variance  2 then the sample means
of size n is also normally distributed with mean  and variance 2 n .
 1  x 
1
f  x 
exp   

2

2



2



f x 
1

2 


1  x 

exp   
2  n
n










2




If the population from which samples are taken is not normally
distributed, but has mean  and variance  2 , then the sample
means of size n is normally distributed with mean  and variance
2 n when n   (large sample).
2
 1  x 
1
f  x 
exp   

2
 2   
2





1
1  x 

f x 
exp   
2  n
2  n


when n  










 
 


Estimation of Population Mean
Assume population distribution is normal, regardless of sample size
or take large samples n   , regardless of population distribution.
The sample means x follows normal distribution with mean  and
standard deviation  n .
Let u  x   , where u is known as standard normal variable.
 n
Then, P u 2  u  u 2   1  


x 
P u 2 
 u 2   1  
 n



 

P  x  u 2
   x  u 2
  1 
n
n

x  u 2

n
and
x  u 2

n
2
  u 2
2

n

  u 2

n
x
are called the 1   100% confidence interval of .
Popular Confidence Intervals for Population Mean
Often, the 95 percent confidence intervals of population mean µ
are estimated, they are

 

x

1.96
,
x

1.96


n
n

The 99 percent confidence intervals of population mean µ are

 

x

2.58
,
x

2.58


n
n

Illustration
Consider the theoretical population of yarn strength follows normal
distribution with mean at 14.56 cN.tex-1 and standard deviation at
1.30 cN.tex-1. Then, for random samples of 450 yarns selected from
this population, the probability distribution of sample means follow
normal distribution with mean 14.56 cN.tex-1. and standard
deviation 0.0613 cN.tex-1 (1.3/21.21 cN.tex-1 = 0.0613 cN.tex-1).
This distribution is shown in the next slide.
Illustration (Contd.)
f  x cN-1 .tex 


µ=14.56
xcN.tex-1 


In the long run, 68.26 percent of
the mean strength of random
samples of 450 yarns selected
from this population will
involve sampling errors of less
than 0.0613 cN.tex-1. Or, the
probability of a sample mean
being in error by 0.1201
(1.96×0.0613) cN.tex-1 or more is
0.05.
Probability Distribution of Sample Means (Contd.)
Assume that the population from which samples are taken is
normally distributed, but mean  and variance  2 are unknown.
Then, we consider a statistic T1 as defined below
T1 

x 
S
n
i n

where
S
  xi  x 
2
i 1
n 1
.
The PDF of T follows t-distribution
n
n

 
2

T1  2
2

f T1  
1


n 1 
 n 1 


  n  1 
 2 
where n-1 is known as degree of
freedom and  denotes gamma
function   n   n 1
f T1 
Normal
distribution
n=30
n=10
n=3
T1
Probability Distribution of Sample Means (Contd.)
One can see here that as n  30 the
t-distribution practically approaches
to a normal distribution.
f T1 
Normal
distribution
n=30
Small sample (practically): n  30
n=10
Large sample (practically): n  30
n=3
Note: For large sample, one can then
find out the confidence interval of
population mean based on normal
distribution as discussed earlier.
T1
Estimation of Population Mean (Contd.)
The statistic T1 follows t-distribution with n-1 degree of freedom.
P t 2  T1  t 2   1  

or, P  x  t 2



x 
or, P t 2 
 t 2   1  
S n


S
S 
   x  t 2
  1 
n
n
The 1   100% confidence intervals for  are
S
S 

x

t
,
x

t

.
2
2
n
n

The value of t 2 for n-1 degree of freedom can be found from t table.
Illustration
Consider a population of cotton fibers has mean length 25 mm and
standard deviation of length is 0.68 mm. Then, for random samples
of 10 fibers selected from this population, the probability
distribution of sample means follow t-distribution with mean length
25 mm and standard deviation of length 0.23 mm, and degree of
freedom 9.
This distribution is shown in the next slide.
Illustration
f T1 
The probability of mean length
of 10 fibers selected from the
population being in error by
0.52 mm (2.262×0.23) or more is
0.05.
 x  0.23
T1  Scale
23.85 24.08
24.31 24.54 24.77
25.00 25.23
=
25.46
25.69 25.92 26.15
x  Scale (mm)
Probability Distribution of Difference Between Two
Sample Means
y1 , y2 , , yn
Let x1 , x2 , , xn and
be two independent sample
observations from two normal populations with means  x ,  y and
variances  x ,  y , respectively.
2
1
Or, let x1 , x2 , , xn and y1 , y2 , , yn
be two independent large
sample observations from two populations with means  x ,  y and
variances  x ,  y , respectively.
2
1
Then the variable U
U
 x  y    x   y 
sx2  s y2
is a standard normal variable with mean zero and variance one.
Estimation of Difference Between Two Populations
P U  2  U  U  2   1  


 x  y   x   y 


P U  2 
 U 2   1 
sx2  s y2



P  x  y  U 2
s
2
x
 s y2     x   y    x  y   U  2

Hence the 100(1-) % confidence intervals for

 x  y  U 2
s
2
x
 s y2  ,  x  y   U  2
s
s
2
x
2
x
x

 s y2   1  
 y 
 s y2 

are
Probability Distribution of Difference Between Two
Sample Means (Contd.)
Let x1 , x2 , , xn and y1 , y2 , , yn be two independent small sample
observations from two populations with means  x ,  y
and
variances  x ,  y , respectively. Then the variable T2
2
1
T2 
 x  y   x   y 
Sx  y
1 1

n1 n2
,
2
2
n

1
S

n

1
S




x
2
y
 1
,
where S x2 y 
 n1  1   n2  1
 n1  1   n2  1
follows t-distribution with
n12x  n22y
n1  n2  2
degree of freedom.
Estimation of Difference Between Two Population Means
(Contd.)
P t 2  T2  t 2   1  




 x  y    x   y 


P t 2 
 t 2   1  
1 1


Sx  y



n1 n2

1 1
1 1
P  x  y   t 2 S x  y

   x   y    x  y   t 2 S x  y

n1 n2
n1 n2

Hence the 100(1-) % confidence intervals for

x
 y 

1 1
1 1
x

y

t
S

,
x

y

t
S






 2 xy
 2 xy
n1 n2
n1 n2





  1 

are
Probability Distribution of Sample Variances
Suppose we draw a number of
samples each containing n random
variables x1 , x2 , , xn
from a
population that is normally
distributed with mean  and
variance  2
then the sample
means x of size n is also normally
distributed with mean  and
variance 2 n . Then the variable
 ns 2
1
 xi  x 




  2
i 1 
i n
2
i n
 x  x 
i 1
i
ns 2  n  1 S
 2 


2
2
2
follows  distribution with n-1
degree of freedom.
PDF:
f  
e


2

n 
 1
2 
n
2  
2
n
2
with n d.f.
0  2  
f 
n=1
n=2
n=3 n=4
n=5
2

Estimation of Population Variance
Then


P 21 2,n1    2 2,n1  1  
 2

ns 2
2
P 1 2,n 1  2    2,n 1   1  



2
 ns 2

ns
2
or, P  2
  2
  1 
  2,n 1 
 1 2,n 1
2
  n  1 S 2
n

1
S



or, P  2
 2  2
  1 
  2,n 1 
 1 2,n 1
2
2

n

1
S
n

1
S






The 100(1-)% confidence intervals for  2 are  2
, 2
.
 1 2,n1   2,n1 


Probability Distribution of Sample Variances (Contd.)
When
n  ,
the following statistic
 n  1 S  2n  1
ns 2
  2  2n  1  2 2  2n  1  2

2
2
approaches to a standard normal distribution with mean zero and
variance one.
Estimation of Population Variance (Contd.)
P   2      2   1  


n  1 S 2

P   2  2
 2n  1    2   1  
2





2
2
2  n  1 S
 2  n  1 S

2
P
 
 1 
2
2
 2n  1    2
2n  1    2 






The 100(1-)% confidence intervals for  2 are






2  n  1 S
2
2n  1    2
,
2
 
2  n  1 S
2
2n  1    2



2


Probability Distribution of Sample Proportions
Take large samples n   , then we know that binomial distribution
approaches to normal distribution. Then the variable V is a
standard normal variable with mean zero and variance one.
V
p  p
p 1  p 
n
where p  x n , x is the number of successes in the observed sample,
n being the sample size.
Note: Earlier we have shown that the standard deviation of p’ is
But, when p is not known, p’ can be taken as an unbiased
 p 1  p   n .
estimator of p, then the standard deviation of p’ can be written as
 p 1  p   n .
Estimation of Population Proportion
P V 2  V  V 2   1  



P V 2 




p  p

 V 2   1  
p 1  p 


n

P  p  V 2

p 1  p 
 p  p  V 2
n
p 1  p  
  1 
n

The 100(1-)% confidence intervals are

p 1  p 
p 1  p  




p

V
,
p

V


2
2
n
n




Illustration
Consider a population consists of “good” garments and “bad”
garments. A random sample of 100 garments selected from this
population showed 20 garments were bad, hence the proportion of
good garments in this sample was p  0.80.
Then, for random
samples of 100 garments taken from this population, the probability
distribution of p  follows normal distribution with mean 0.8 and
standard deviation 0.04   0.8  0.2 100  .
This distribution is shown in the next slide.
Illustration
In long run 68.26 percent of the
means of random samples of
100 garments selected from this
population
will
involve
sampling errors of less than
0.04. Or, the probability of a
sample mean being in error by
0.0784 (1.96×0.04) or more is
0.05.
f  p
p
Testing of Hypothesis
Need for Testing
Testing of statistical hypothesis is a process for drawing some
inference about the value of a population parameter from the
information obtained in a sample selected from the population.
Types of Test
1. One-tailed test
2. Two-tailed test
Illustration
Sometimes we may be interested only in the extreme values to one
side of the statistic, i.e., the so-called one “tail” of the distribution,
as for example, when we are testing the hypothesis that one process
is better than the other. Such tests are called one-tailed tests or onesided tests. In such cases, the critical region considers one side of
the distribution, with the area equals to the level of significance.
Sometimes we may be interested in the extreme values on both
sides of the statistic, i.e., the so-called two “tails” of the distribution,
as for example, when we are testing the hypothesis that one process
is not the same with the other. Such tests are called two-tailed tests
or two-sided tests. In such cases, the critical region considers two
side of the distribution, with the area of both sides equals to the
level of significance.
Testing Procedure
Step 1 : State the statistical hypothesis.
Step 2 : Select the level of significance to be used.
Step 3 : Specify the critical region to be used.
Step 4 : Find out the value of the test statistic.
Step 5 : Take decision.
Statement of Hypothesis
Suppose we are given a sample from which a certain statistic such
as mean is calculated. We assume that this sample is drawn from a
population for which the corresponding parameter can tentatively
take a specified value. We call this as null hypothesis. This is
usually denoted by H . This null hypothesis will be tested for
possible rejection under the assumption that the null hypothesis is
true.
Alternative hypothesis is complementary to null hypothesis. This
is usually denoted by H A .
For example if H :   0 is rejected then H A :   0 ,   0 ,   0 .
Selection of Level of Significance
The level of significance, usually denoted by  , is stated in terms of
some small probability value such as 0.10 (one in ten) or 0.05 (one in
twenty) or 0.01 (one in a hundred) or 0.001 (one in a thousand)
which is equal to the probability that the test statistic falling in the
critical region, thus indicating falsity of H.
Specification of Critical Region
A critical region is a portion of the scale of possible values of the
statistic so chosen that if the particular obtained value of the
statistic falls within it, rejection of the hypothesis is indicated.
Test Statistic
The phrase “test statistic” is simply used here to refer to the statistic
employed in effecting the test of hypothesis.
The Decision
In this step, we refer the value of the test statistic as obtained in
Step 4 to the critical region adopted. If the value falls in this region,
reject the hypothesis. Otherwise, retain or accept the hypothesis as a
tenable (not disproved) possibility.
Illustration: A Problem
A fiber purchaser placed an order to a fiber producer for a large
quantity of basalt fibers of 1.4 GPa breaking strength. Upon
delivery, the fiber purchaser found that the basalt fibers, “on the
whole”, were weaker and asked the fiber producer for replacement
of basalt fibers. The fiber producer, however, replied that the fibers
produced met the specification of the fiber purchaser, hence, no
replacement would be done. The matter went to court and a
technical advisor was appointed to find out the truth. The advisor
conducted a statistical test.
Illustration: The Test
Step 1 : Null hypothesis
H : GPa   1.4
Alternative hypothesis H A : GPa  1.4
where  is the population mean breaking strength of basalt fibers
as ordered by the fiber purchaser.
Step 2 : The level of significance was chosen as   0.01.
Step 3: The advisor wanted to know the population standard
deviation  of strength. So he made a random sample with
65 fibers  n  65 and observed the sample standard
deviation s of breaking strength was 0.80 GPa. Then, he
estimated the population standard deviation ̂ of strength
as follows:
n
65
ˆ GPa  
n  1
sGPa  
65  1
0.80  0.8062
f  x -
Illustration: The Test Continued
Then, the critical region for mean breaking
strength x is found as:
xGPa    GPa  u
ˆ GPa 
n
 1.4   2.33  0.10   1.1670
 GPa
xGPa 
Step 5: The advisor observed that the sample mean breaking
strength x was 1.12 GPa.
Step 6 : The advisor referred to the observed value xGPa  1.12 to the
critical region he established and noted that it fell in this
region. Hence, he rejected the null hypothesis and thus
accepted the alternative hypothesis GPa  1.4.
Errors Associated with Testing of Hypothesis
Let us analyze the following situations:
Possibilities
True H
False H
Course of Action
Accept
Reject
(Desired correct action)
(Undesired erroneous action)
Accept
Reject
(Undesired erroneous action)
(Desired correct action)
Type I Error: Rejecting H when it is true.
Type II Error: Accepting H when it is false.
In situations where Type I error is possible, the level of significance

represents the probability of such an error. Higher is the
value of level of significance, higher is probability of Type I error.
Type I Error
0
means complete elimination of occurrence of Type I error.
Of course, it implies that no critical region exists, hence H is
retained always. In this case, in fact, there is no need to analyze or
even collect any data at all. Obviously, while such a procedure
would completely eliminate the possibility of making a Type I
error, it does not provide a guarantee against error, for every time
that the H stated is false, a Type II error would necessarily occur.
Similarly, by letting   1 it would be possible to eliminate entirely
the occurrence of Type II error at the cost of committing a Type I
error for every true H tested.
Thus, the choice of a level of significance represents a compromise
effect at controlling the two type of errors that may occur in testing
statistical hypothesis.
Type II Error
We see that for a given choice of  , there is always a probability for
Type II error. Let us denote this probability by . This depends
upon:
(1) the value of  chosen
(2) the location of critical region
(3) the variability of the statistic
(4) the amount by which the actual population parameter differs
from the hypothesized value of it, stated in H.
Because in any real situation, the actual value of a population
parameter can never be known, the degree of control exercised by a
given statistical test on Type II error can never be determined.
Illustration: The beta value
f  x -
Let us assume that the actual population
mean breaking strength of basalt fibers
supplied by the fiber producer was 1.0
GPa. In this case, Type II error will occur
xGPa   1.1670.
when
The probability

of this can be found as under
  P  xGPa 
Critical
region

 GPa
xGPa 







ˆ GPa  
  1.1670  P u   1.67   0.0475
 1.1670  P GPa    u
 




n 
 1.0 


 0.10 



Hence, the probability of Type II error is 0.0475. That is, if in this
situation, this test were to be repeated indefinitely, 4.75 percent of
the decisions would be of Type II error.
Illustration: Effect of alpha on beta
Let us take   0.001 . Then the critical region is
xGPa    GPa  u
Then,
ˆ GPa 
n
 1.4   3.09  0.10   1.091







ˆ GPa  
  1.091  P u   0.91  0.1814
  P  xGPa   1.091  P GPa    u
 




n 
 1.0 



0.10






In this way, we obtain
0.001
0.1814
0.005
0.0778
0.010
0.0475
0.050
0.0094
0.100
0.0033
As alpha increases,
beta decreases.
Illustration:
Effect of Location of Critical Region on Beta Value
Let us suppose that H : GPa  1.4 and H A : GPa  1.4. It means
H A : GPa  1.4 or H A : GPa  1.4. Then, the two critical regions are:
xGPa    GPa  u
xGPa    GPa  u
ˆ GPa 
n
ˆ GPa 
n
 1.4   2.58  0.10   1.1420
 1.4   2.58  0.10   1.6580
Assume GPa  1.0. Then, the value of beta is the probability that
xGPa  lies in-between +1.1420 and +1.6580. This is the same as the
probability of u in-between +1.42 to +6.58, which for all practical
purposes, is simply the probability of u>+1.42. Hence   0.0778.
Illustration: Effect of Sample Variability on
Critical Region on Beta Value
Let us suppose that H : GPa  1.4 and H A : GPa  1.4. Assume the
estimate of population variance is increased to 0.20 GPa. Then the
critical region of xGPa  at   0.01 is
xGPa    GPa  u
ˆ GPa 
n
 1.4   2.33  0.20   0.9340
Now, if we assume GPa  1.0
  P  xGPa 







ˆ GPa  
  0.9340   P u   0.33  0.6293
 0.9340   P GPa    u
 




n 
 1.0 



0.20




Illustration: Effect of Difference Between Actual
& Hypothesized Values of Population Parameter
Let us suppose that
ˆ GPa 
H : GPa   1.4 and
H A : GPa  1.4. Choose
 0.10. The critical region is
n
ˆ GPa 
xGPa    GPa  u
 1.4   2.33  0.10   1.1670
n
Let us find out  assuming the actual value of GPa is 0.9.







ˆ GPa  
  1.1670   P u   2.67   0.0038
  P  xGPa   1.1670   P GPa    u
 




n 
 0.9 



0.10




Let us now find out  assuming the actual value of GPa is 1.30.







ˆ GPa  
  1.1670   P u   1.33  0.9082
  P  xGPa   1.1670   P GPa    u
 




n 
 1.3 


 0.10 



  0.01. Take
Power of A Statistical Test
Suppose that the actual value of a population parameter differs by
some particular amount from the value, H, hypothesized for it such
that the rejection of H is the desired correct outcome. The
probability when this outcome will be reached is the probability that
the test statistic falls in the critical region. Let us refer to this
probability as the
power (P) of the statistical test. Since
represents the probability that the test statistic does not fall in

the critical region, then the probability P that the test statistic falls in
the critical region is 1    P.
Hence, the power of a test is the
probability that it will detect falsity in the hypothesis.
Power Curve
The power curve of a test of a statistical hypothesis, H, is the plot of
P-values which correspond to all values that are possible
alternatives to H.
In other words, power curve may be used to read the probability of
rejecting H for any given possible alternative value of .
Power Curve: Illustration
Let us draw the power curve for the previous example. There exists
an infinite collection of  –values
GPa  1.4 that are possible
alternative values to the hypothesized value of 1.4 and accordingly Pvalues exist. Some are shown here.
-
P-  1 -
GPa
u-
0.7
4.67
0
1
0.8
3.67
0.0001
0.9999
0.9
2.67
0.0038
0.9962
1.0
1.67
0.0475
0.9525
1.1
0.67
0.2514
0.7486
1.2
-0.33
0.6293
0.3707
1.3
-1.33
0.9082
0.0918
Power Curve: Illustration (Contd.)
P-
GPa
Inadequacy of Statistical Hypothesis Test
We have seen that the decision of a statistical hypothesis test is
based on whether the null hypothesis is rejected or is not rejected at
a specified level of significance (-value). Often, this decision is
inadequate because it gives the decision maker no idea about
whether the computed value of the test statistic is just barely in the
rejection region or whether it is very far into this region. Also, some
decision makers might be uncomfortable with the risks implied by a
specified level of significance, say =0.05. To avoid these difficulties,
the P-value approach has been widely used.
  u0 
P-value Approach [2]
u0
The P-value is the probability that the test statistic takes on a value
that is at least as extreme as the observed (computed) value of the
test statistic when the null hypothesis H is true. Otherwise, it can be
said that the P-value is the smallest level of significance that would
lead to rejection of the null hypothesis with the given data.
How one can calculate the P-value?
2 1    z0    for a two-tailed test:

 

1


z


P-value= 
 for a upper-tailed test:
0




z


0

 for a lower-tailed test:
H :   0 , H A :   0
H :   0 , H A :   0
H :   0 , H A :   0
  z  is cumulative distribution function of standard normal variable z.
Testing Procedure
Step 1 : State the statistical hypothesis.
Step 2 :Find out the value of the test statistic.
Step 3 : Find out the P-value.
Step 4 : Select the level of significance to be used.
Step 5 : Take decision.
Illustration: The Test
We refer to the problem of strength of basalt fiber.
Step 1 : Null hypothesis
H : GPa   1.4
Alternative hypothesis H A : GPa  1.4
where  is the population mean breaking strength of basalt fibers
as ordered by the fiber purchaser.
Step 2: The test statistic is computed as follows:
z0   
xGPa    GPa
ˆ GPa 
n

1.12  1.4
1.12  1.4

 2.8
0.10
0.8062 65
Step 3: The P-value is computed as follows:   2.80  0.0026
Illustration: The Test (Continued)
Step 4 : The level of significance  is chosen as   P-value = 0.0026.
Step 5: The null hypothesis would be rejected at any level of
significance 0.0026. For example, the null hypothesis would
be rejected if =0.01, but it would not be rejected if =0.0010.
Frequently Asked Questions & Answers
Frequently Asked Questions & Answers
Q1: What is the difference between the parameter and statistic?
A1: The parameter, representing a statistical characteristic of the population, is a
constant for the given population, but, the statistic, representing a statistical
characteristic of the sample, is a variable.
Q2: What is the standard error of mean fiber length?
A2: The standard deviation of distribution of mean fiber length is the standard error
of mean fiber length.
Q3: Why it is not practically possible to obtain a simple random sample as defined?
A3: It is practically impossible to numerically identify each individual of a
population either because of the large size of the population or because of the
inaccessibility or current non-existence of some of the individuals, therefore, it is not
possible to obtain a simple random sample as defined.
Frequently Asked Questions & Answers
Q4: Why it is often said that the larger sample can give more precise estimation of
population mean?
A4: This is said so because as the sample size increases, the standard deviation of
sample mean (that is, standard error of sample mean) reduces.
Q5: While calculation of variance, sometimes the divisor is found to be n-1, where n
is sample size, and sometimes it is found as n? Why is it so?
A5: While calculating the sample variance, the divisor should be n, and this is known
as a biased estimator of population variance. But, when the divisor is n-1, the
resulting expression is known as an unbiased estimator of population variance.
Q6: In order to know whether the newly developed process is superior to the
existing process, which test – one-tailed test or two-tailed test - is recommended?
A6: One-tailed test.
Frequently Asked Questions & Answers
Q7: In order to know whether the newly developed process is different than the
existing process, which test – one-tailed test or two-tailed test - is recommended?
A7: Two-tailed test.
Q8: Is it so that the probability of type II error can be reduced by the choice of a
larger sample size?
A8: Yes
References
1.
Gupta, S. C. and Kapoor, V. K., Fundamentals of Mathematical Statistics,
Sultan Chand & Sons, New Delhi, 2002.
2.
Montgomery, D. C. and Runger, G. C., Applied Statistics and Probability for
Engineers, John Wiley & Sons, Inc., New Delhi, 2003.
Sources of Further Reading
1.
Leaf, G. A. V., Practical Statistics for the Textile Industry: Part I, The Textile
Institute, UK, 1984.
2.
Leaf, G. A. V., Practical Statistics for the Textile Industry: Part II, The Textile
Institute, UK, 1984.
3.
Gupta, S. C. and Kapoor, V. K., Fundamentals of Mathematical Statistics,
Sultan Chand & Sons, New Delhi, 2002.
4.
Gupta, S. C. and Kapoor, V. K., Fundamentals of Applied Statistics, Sultan
Chand & Sons, New Delhi, 2007.
5.
Montgomery, D. C., Introduction to Statistical Quality Control, John Wiley &
Sons, Inc., Singapore, 2001.
6.
Grant, E. L. and Leavenworth, R. S., Statistical Quality Control, Tata McGraw
Hill Education Private Limited, New Delhi, 2000.
7.
Montgomery, D. C. and Runger, G. C., Applied Statistics and Probability for
Engineers, John Wiley & Sons, Inc., New Delhi, 2003.