Module Eight: Comparative Studies

Download Report

Transcript Module Eight: Comparative Studies

Module Eight:Comparative Study for Inter-laboratory Testing
When an inter-laboratory testing is conducted, the analysis of the
testing results may include:
•
Determine the best estimate and its corresponding uncertainty of the
variable of interest: xBest  U xBest
•
Make an interval estimation of the variable of interest based on the
corresponding distribution: confidence interval: xBest  kU x
Best
•
Conduct a comparative study:
1.
Comparing with the reference standard.
2.
Comparing the effects between two groups, when two samples are
tested in dependently: For example, two methods of testing procedures
are to be compared. 20 units of similar material will be randomly
assigned for testing using either methods, 10 for each method. The
purpose is to compare the difference between these two testing
methods.
1
3. Comparing the changes of a response before and after (or with/without) a
treatment is performed. For example, to test the poison of a chemical
compound with and without an additional additive in ten labs. Each
compound is divided into two sub-samples. Each lab test the pair of the
compound, one with additional additive, he other without. The difference
between each pair tested by a lab is due to the additive. Note, in this
comparative study, each pair of sub-samples are the same or very similar.
This is a paired sample problem.
4. Comparing the effects among several groups, when a treatment has more
than two levels. This type of comparative studies are common in interlaboratory testing. For example, one is interested in studying the
compressive strength of concrete using five different formula. Ten
specimen are produced using each formula. The compressive strengths
are tested. This is a one-factor experiment with five factor levels. Our
interest is to compare their strength and to determine which formula gives
the highest strength. If the only difference of these formula is the dosage
of an additive, ranging from 1%, 1.5%, 2%, 2.5% and 3%. Then, in
addition to compare the strength among the formula, we can also fit a
prediction model to determine the dosage level that results the maximum
strength.
2
5. In many experiments, there may be more than one factor. The study is not only
understand the effect of each factor, but also to study the interaction effect between two
factors. This is a multifactor study. For example, For the compressive strength of
concrete testing study, in addition to the five levels of formula, in the process of
concrete formation, the temperature is another critical factor. We should consider both
formula factor and temperature factor when producing the specimen for strength test.
Suppose we would like to test for three levels of temperature. We have 5x3 two
factorial design. For each treatment combination, four specimen are produced. We have
a total of 3x5x4 = 60 specimen for strength testing. We are interested in studying
comparing the strength among different formula, among different temperature, and the
strengths among different formula for each temperature level.
6. Another type of study in lab testing is to study the variance components of factors for
the purpose of identifying factor levels that will reduce variability of response variable.
For example, in a metal alloy casting process, each casting is broken into small bars that
are used for other applications. The tensile strength of the alloy is critical to its intended
use. There is a specification of the strength. If variation of the strength is excessively
large, this means a large amount of bars will not meet the specification limits. An
experiment can be designed to identify factors and their level combinations that will
produce bars with small variability. This is a variance component problem.
3
In this module, we will discuss the type of comparative studies: 1,
2 and 3. In Module Ten, we will discuss the comparative study
four, the one-factor design and analysis. And in Module Eleven we
will focus on comparative 5, multifactor designs and analysis.
Module Twelve will study the Variance Components problems.
4
Comparative Study One: Comparing testing results with a
given reference or a given standard
In a lab testing study, one may be be interested in making a comparison of the
testing results with a given standard or a reference measurement. The
following steps may be applied to plan such a study:
1.
Identify the given standard or reference measurement, and make sure the
resource that developed the standard meet your purpose.
2.
Set up an adequate lab testing environment and testing procedure.
3.
The operator of the testing should be adequately trained to reduce unexpected
errors.
4.
Plan the experimental procedure, determine the number of experimental runs
to be conducted.
5.
Prepare the needed experimental units, and make sure these units are as
homogeneous as possible.
6.
Conduct the lab testing and carefully collect the data of interest. It is a good
practice to record any special events occurred during the testing.
5
Now, a data set is collected, and we would like to make a
comparison with a given reference. Steps for this analysis
may include:
1.
Carefully check the data for unusual measurements that may be due to
systematic error or special causes – Techniques for detecting outliers
can be applied here.
2.
Compute descriptive summaries and graph a histogram, box plot for
identifying outliers and normal probability plot for checking the
normality assumption.
3.
If there is a serious violation of normality assumption, one may choose
to make a data transformation. If there are outliers, one should go back
to check the possible special causes, and decide to keep or drop these
outliers before the analysis.
4.
The comparison is the one-sample test. Here is the procedure to
conduct the comparison.
6
One-sample t-test for comparing the testing results with a
given reference.
Example: The brightness of a certain type of paper is defined in the scale
of 1 to 100. A reference of the brightness of the type of paper is at the
scale 60. A lab is experimenting a new process for producing the type of
paper, and would like to test its brightness to see if the paper meet the
required brightness. A random sample of 30 sheets are chosen and tested
by a lab. Here is the collected data:
55
42
59
64
59
68
60
52
56
59
55
62
59
57
63
58
52
55
58
61
65
63
52
58
62
54
58
59
64
63
A quick eye check immediately identify a value of 42, which a much
smaller than the rest.
We first draw a box plot and a normal probability to identify outliers and
to check the normality assumption.
7
Boxplot of brightness using the entire data set
40
50
60
Boxplot of brightness_1: The outlier '42' is delected
50
70
60
Normality Test for the Brightness - excluding the outlier
.999
.99
.95
Probability
•Reviewing the records from the
lab testing, it is noticed that the
paper given ’42’ was due to a
special cause of wrong timing in a
testing process. It is therefore
removed from further analysis.
70
brightness_1
brightness
.80
.50
.20
.05
.01
.001
•The normality test appears data
follow normal curve very well.
52
57
62
67
brightness_1
Average: 58.9655
StDev: 4.11862
N: 29
Anderson-Darling Normality Test
A-Squared: 0.302
P-Value: 0.554
8
The concept and Procedure for performing
the one sample t-test
When we are conducting a hypothesis test for comparing with a given reference,
there are usually two choices; one is the hypothesis we intend to establish in our
study, the other is the opposite. In order to make the procedure of testing easier, we
define these two hypotheses:
H0 and Ha. Ha is the one we intend to establish. For this paper brightness test, our Ha
is the actual average brightness of the paper is significantly different from the given
reference.
H 0 :   0
Typical notation for the hypotheses are:
H 0 :   60
For the paper brightness study, we have:
H a :   0
H a :   60
Q: When.how do we decide to take H0 or Ha ?
As we see, if the average of the sample data is either much larger or much smaller
than 60, we will choose Ha; otherwise, we choose H0.
9
Q: But, how far is far enough to make such a conclusion?
If the sample average is, say 59.5 or 60.4, then, we would not conclude it is far
enough to conclude Ha. Therefore, we will need two critical average brightness,
x1 and x2
, so that when the sample average obtained from the sample data is
beyond these two values, we will conclude Ha, that is, the brightness is of the
paper is significantly different from the reference brightness, 60.
Q: How to determine the two critical values?
This can be answered by bringing in the distribution of
distribution is the distribution of X under H0.
a/2.025
a/2.025
x1
Reject H0
60
Accept H0
-t(a/2, n1
X
x2
Reject H0
t(a/2, n1
x  0
t
s n
X
. The following
Our common experience suggests
that the probability of rejecting H0
should be small, so that, only
when the sample average is much
far away from 60, we will
conclude Ha. Therefore, a typical
probability for rejecting H0 is 5%
or 1%.
Standardized form of X is used
for making proper comparison,
which is the t-distribution. 10
Procedure for conducting one-sample t-test:
1.
Set up H0 and Ha
2.
Determine the rule for rejecting and accepting H0 regions based on the type of
hypothesis rule based on the t-distribution.
3.
From the sample data, we compute the t-value from the sample average:
tobserved 
xobserved  0
s n
4. Compare the tobserved with the critical t-values , -t(a/2, n1 and t(a/2, n1) from the ttable to determine if tobserved falls in the Acceptance or in the Rejection region.
NOTE: Computer output gives us both the tobserved and the observed level of
significance, namely, the p-value.
The p-value for this two-sided test is 2P(t > |tobserved|)
And the decision making based on p-value is :
P-value < a , then, we reject H0, that is decide to take Ha
P-value

a , then, we conclude H0
11
Right-side and Left-side tests
Ha is the hypothesis we intend to establish. Therefore, in applications, other tha twoside tests, there are two common hypotheses:
•Right-side test : H 0 :   0
•Left side-test.
H 0 :   0
H a :   0
H a :   0
How to choose the test for our need?
•If our intension is to find out if the sample mean is much larger than the reference
value or not, right-side test should be applied. For example, if the reference value of
the brightness of paper, 60, is the minimum. Our goal is to decide if the new process
produces significantly brighter paper or not.H 0 :   60
H a :   60
• If our intension is to find out if the sample mean is much lower than the reference
value or not, right-side test should be applied. For example, if the reference value of
the brightness of paper, 60, is the maximum allowed. Our goal is to decide if the new
process produces significantly less bright paper or not.
•If our intension is to find out if the sample mean is much lower than the reference
value or not, right-side test should be applied. For example, if the reference value of
the brightness of paper, 60, is the given standard. Our goal is to decide if the new
12
process produces significantly different brightness of paper or not.
Hands-on Activity: Comparative Study with A given Reference
In testing the tensile strength of a new type of concrete, the goal is to make sure
that the tensile strength meets the minimum of 300 psi. A lab is assigned to test
this new concrete. 20 samples are tested. The tensile strengths are :
320
305
293
295
313
306
298
325
304
316
307
308
307
305
319
294
295
295
300
312
Perform an appropriate test to determine if the new type of concrete meets
the minimum tensile strength of 300 psi.
13
Comparative Study for Inter-laboratory Testing : two-group cases
Using the example of brightness of paper, there are many situations that the testing
may involve with two groups of treatment. Here are some possible situations:
1.
when chemical component is changed, the brightness could be changed
dramatically. A comparative study can be planned to compare the effect of two
different levels of this chemical component.
2.
When papers are tested by two different labs, there may be between-lab
differences. Such difference should be controlled to minimize the systematic error
of a given lab when testing the same material using the same testing procedure.
3.
When papers are testing using two different testing procedure, it is important to
identify the difference between these two testing procedures.
A comparative two-group study may be to compare the difference of two types of
material, two different treatments , two testing procedures, or difference between
two labs. We now discuss a method for making the two-group comparison. Similar
to the comparison between a given reference and a sample data, if is important to
keep in mind that we need to conduct outlier analysis and distribution checking.
14
The issue of designing experiments for two-sample comparative study
Consider the example of comparing the reaction of a chemical component in a lab testing
Treatment : Two levels of chemical component.
We will discuss two types of designs for experiment:
1.
Design A – Paired sample design:
The units assigned to two treatment
each time are very similar, since
they are from the same specimen.
Add Level A component
Test n = 15 pairs. Each pair
are tested together
Specimen is split into
two sub-samples
Add Level B component
Add Level A component
2.
Design B-Independent sample design: Each
treatment is assigned to 15 units, which are
independent of the other treatment.
Test n = 15 units
Test n = 15 units
Add Level B component
NOTE: a paired-sample comparison is usually referred to Before/After Treatment or Pre/Post Treatment
experiment. The variable of interest is observed before and after a treatment. This type of design occurs
often in testing the effect of c treatment along the time domain. For example, one my be interested in
studying the chemical residue for 5 day, 10 days after the chemical is sprayed to a certain vegetable.
15
NOTE: a paired-sample comparison is usually referred to Before/After Treatment or Pre/Post Treatment
experiment. The variable of interest is observed before and after a treatment. This type of design occurs
often in testing the effect of c treatment along the time domain.
For example, one my be interested in studying the chemical residue for 5 day, 10 days after the chemical
is sprayed to a certain vegetable.
Time
Treatment: Spray the
chemical to n randomly
chosen subjects.
Test the residue
five days after from
the subjects
Test the residue ten
days after from the
same subjects
Treatment is given. Eg, a diet
treatment for three months
Time
Before diet treatment: observe
weight, BMI, age, Gender, etc,
from each subject
Three months after, observe
weight, BMI, etc, from the
same subject.
Hands-on Activity
For the same study, one can design a two-independent sample study as well.
Design a two independent sample study for studying the chemical residue, and discuss the advantage
16
and disadvantage of paired-sample Vs independent sample designs.
The difference between Experiment A and B is:
Samples obtained from experiment A can be considered as 15 pairs, each pair is sampled
from the sub-group. Possible sources that may introduce the error is the same for two
samples except the levels of component. The experimental units are similar.
Samples obtained from Experiment B are two independent samples. Each is obtained from
the process that is independent from the other process. Possible sources that may introduce
errors include not only the levels of components but also the differences of the processes.
Therefore, the paper units for testing the brightness may have higher variation.
Analyses of data resulted from these twp experiments are different.
Experimental A is a paired sample problem, while B is an independent sample problem.
Hands-On Activity
From the projects you have conducted, identify a paired sample
project and one for independent sample project.
17
Analysis of Paired Sample Problem
Consider the experiment for testing the chemical residue.
Experiment: 15 pots of a certain vegetable are used as the experiment units. The residue is
measured and recorded five days and ten days after the spray.
X: the residue five days after the chemical treatment.
Y: the residue ten days after the chemical treatment.
Testing Procedure: Each residue is the average of the residues of two specimen taken from the
same plot for the purpose of reducing random error.
Pot
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
x
59
61
64
62
59
63
58
59
64
65
64
60
67
65
63
y
54
52
59
60
61
60
56
61
58
59
62
61
61
58
57
d = y-x
-5
-9
-5
-2
2
-3
-2
-2
-6
-6
-2
1
-6
-7
-6
For each pot, the residues are observed five days and ten days after. Hence the
difference between Y-X is the residue reduction in the five days of time period. To
understand if the reduction of residue is statistically significant, we can then perform a
one-sample test based on the difference, d. The hypothesis is:
H 0 : d  0, H a : d  0
18
Recall: To perform a one-sample t-test, we need:
d , sd , SE d
The following is the output from Minitab
Paired T for 10 days - 5 days
N
Mean
StDev
SE Mean
10 days(y)
15
58.600
2.849
0.735
5 days (x)
15
62.200
2.731
0.705
Difference (d)
15
-3.600
3.376
0.872
95% CI for mean difference: (-5.470, -1.730)
T-Test of mean difference = 0 (vs not = 0):
T-Value = -4.13
P-Value = 0.001
Boxplot of Differences of Residues between 10-days and 5-days
•Based on the p-value = .001 < 5%, we can
conclude that the residue reduction is
statistically significant at a = 5%. The average
reduction is 3.6 based on data from 15 pots.
(with Ho and 95% t-confidence interval for the mean)
•The confidence interval at 95% is given by
–5.47 to –1.73. That is the 95% sure that the
uncertainty of the residue is
d  t(.025,14) (SEd )  3.6  2.145(.872)  3.6  1.87
[
-10
_
X
-5
]
Ho
0
Differences
19
Analysis of Two-independent Samples Problem
Consider the experiment for testing the chemical residue. We can design a two-independent
sample experiment for the residue study.
Experiment: 30 pots of a certain vegetable are used as the experiment units. 15 pots are
randomly chosen for the 5-day residue testing. The other 15 are for the 10-day residue testing.
X: the residue five days after the chemical treatment from 15 randomly selected pots.
Y: the residue ten days after the chemical treatment from the other 15 pots.
Testing Procedure: Each residue is the average of the residues of two specimen taken from the
same plot for the purpose of reducing random error.
NOTE: This design is appropriate if each pot can only be applied for one residue testing.
Pot
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
x
59
61
64
62
59
63
58
59
64
65
64
60
67
65
63
Pot
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
y
54
52
59
60
61
60
56
61
58
59
62
61
61
58
57
For each pot, the residue can only be measured either five days or ten days after. The
assignment of pots to residue testing is random, and thus, there are considered
independent. The difference between Y-X no longer reflects the residue reduction, but
also include the pots difference.
20
The residue after 5-days is a population with it’s mean 1 and variance, s12. Similarly, the
residue after 10-days is a different population with it’s mean 2 and variance, s22.
Y
X
Our purpose is to compare if 2 is statistically lower than 1. This is a left-side test:
H 0 : 2  1  0, H a : 2  1  0
Ha is concluded if the corresponding sample mean difference, y  x is indeed
much lower than zero. How much less from zero is considered significant?
Similar to the one-sample problem, we need to determine the distribution of y  x
Or equivalently, the distribution of the standardized form,
( y  x ) / SEY  X
NOTE: Most of statistical hypothesis problems or estimation problems require the
distribution form of the best estimate of the variable of interest. This is usually
accomplished by finding the distribution of the standardized best estimate.
This is true for any test involves t-distribution, chi-square distribution, as well as Fdistribution, and so on.
21
yx
What is the distribution of SE y  x ?
How to determine
?
SEY  x
Based on statistical theory, the t-distribution holds when the samples are randomly
chosen from each population. The quantity SEY  x is the uncertainty of the the mean
difference. The way for determining SEY  x depending on the sample sizes and if the
variances of two populations are homogeneous or not.
When the population variances are not equal, then SEY  x
s(2y  x )
2
sx2 s y
 
, therfore, SE ( y  x ) 
n1 n2
is given by:
2
sx2 s y

n1 n2
However,the deg rees of freedom for the this uncertainty measurement is
a weighted d.f. of n1 and n 2 :
2
s
s 
  
n1 n2 

df =
2 2
2 2
 sy 
 sx 
[  /(n1  1)]  [  /(n2  1)]
 n1 
 n2 
2
x
2
y
22
2
2
2
When the population uncertainties can be assumed equal, that is, s1  s 2  s
we can combine two samples together to obtain a better estimate of the
common measurement uncertainty for y  x :
1.
obtain the pooled estimate of the common variance, s2 , by:
(n1  1) s12  (n2  1) s22
s 
n1  n2  2
2
p
SEY  X  s p
2.
Compute SE of
:
The 100(1-a)% confidence interval for 2  1
1
1

n1 n2
can be determined by:
( y  x )  t(a / 2,df ) SEY  X
23
To test if population mean 2 statistically different from (greater or less than) the
population mean 2.
Two-side Test: H 0 : 2  1  0, H a : 2  1  0
Right-side Test: H 0 : 2  1  0, H a : 2  1  0
Left-side Test: H 0 : 2  1  0, H a : 2  1  0
We apply the t-test by:
1.
yx
t

Compute t-value: obs SE
Y X
2.
Compare tobs with the critical t-value:
For two-side test: If t obs falls outside of -t (a / 2, df ) and t (a / 2, df ) , then reject H 0 .
For right-side test: If t obs > t (a / 2, df ) , then reject H 0 .
For left-side test: If t obs < -t (a / 2,df ) , then reject H 0 .
Or when computer software is available, the p-value is used for decision
making. The same rule is applied when using p-value, regardless what type
of test:
If p-value < a, then, reject H0, and conclude Ha
24
Case Example: A chemical residue study
Purpose: To compare if chemical residue is significantly reduced ten days
after with 5 days after.
Experiment: 30 pots of a certain vegetable are used as the experiment units. 15 pots are
randomly chosen for the 5-day residue testing. The other 15 are for the 10-day residue
testing. X: the residue five days after the chemical treatment from 15 randomly selected
pots. Y: the residue ten days after the chemical treatment from the other 15 pots.
Testing Procedure: Each residue is the average of the residues of two specimen taken from
the same plot for the purpose of reducing random error.
Pot
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
x
59
61
64
62
59
63
58
59
64
65
64
60
67
65
63
Pot
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
y
54
52
59
60
61
60
56
61
58
59
62
61
61
58
57
Variable
Treatment
N
Mean
Residue
Median
StDev
SE Mean
5-days
15
62.20
63.00
2.731
0.705
10-days
15
58.60
59.00
2.849
0.735
25
Normal Probability Plot of Residues After 10-days
.999
.999
.99
.99
.95
.95
Probability
Probability
Normal Probability Plot of Residue After 5 days
.80
.50
.20
.80
.50
.20
.05
.05
.01
.01
.001
.001
58
59
60
61
62
63
64
65
66
67
52
57
5-day
Average: 62.2
StDev: 2.73078
N: 15
62
10-day
Anderson-Darling Normality Test
A-Squared: 0.413
P-Value: 0.296
Average: 58.6
StDev: 2.84856
N: 15
Anderson-Darling Normality Test
A-Squared: 0.594
P-Value: 0.101
Test for Equal Variances for Residue
95% Confidence Intervals for Sigmas
Factor Levels
Diagnosis of assumptions:
•
•
1
2
Both samples follow
normal.
2
3
4
5
F-Test
Test Statistic: 0.919
P-Value
: 0.877
Variances are similar.
Levene's Test
Test Statistic: 0.044
P-Value
: 0.835
Boxplots of Raw Data
1
2
52
57
62
Residue
67
26
Two-Sample T-Test and CI: Residue, Treatment (Without assume equal
variances)
Treatment
1
2
N
Mean
StDev
15
62.20
2.73
15
58.60
SE Mean
2.85
0.71
0.74
Difference = mu (1) - mu (2)
Estimate for difference:
Note: DF = 27 is
computed to adjust the
unequal variances
3.60,
95% CI for difference: (1.51, 5.69)
T-Test of difference = 0 (vs >):
T-Value = 3.53
P-Value = 0.001
DF = 27
Two-Sample T-Test :Residue, Treatment ( assume equal variances)
Difference = mu (1) - mu (2)
Estimate for difference:
3.60
Note: sp is used as
the common s.d.
T-Test of difference = 0 (vs >):
T-Value = 3.53
P-Value = 0.001
DF = 28
Both use Pooled StDev = 2.79
27
Box Plots for the Residues - Two Independent Samples
Residue
67
62.2
62
58.6
57
52
1
2
Treatment
Conclusion: The s.d.’s are similar. Levene’s test of uniformity of variances shows
p-value = .835. We can use either t-test to test the hypothesis ‘If the residue 10days after is significantly reduced from 5-days after. Two t-test results
(assuming/not assuming equal variance) give the same conclusion:
P-value < 5%, therefore, the reduction of residue from 5-days to 10-days after the
chemical spray is statistically significant.
28
Hands-on Activity
Perform the two-independent sample test manually, and compare
with the computer output.
29
30