Design of Engineering Experiments Part 2

Download Report

Transcript Design of Engineering Experiments Part 2

Chapter 2: Simple Comparative
Experiments (SCE)
• Simple comparative experiments: experiments
that compare two conditions (treatments)
– The hypothesis testing framework
– The two-sample t-test
– Checking assumptions, validity
• Homework:
– P2-1, P2-5, and P2-6 due Sunday 7/3/2010
– P2-11 and P2-17 due Sunday 14/3/2010
1
Portland Cement Formulation (page 23)
• Average tension bond sterngths (ABS)
differ by what seems nontrivial amount.
• Not obvois that this difference is large
enough imply that the two formulations
really are diff.
• Diff may be due to sampling fluctuation and
the two formulations are really identical.
• Possibly another two samples would give
opposite results with strength of MM
exceeding that of UM.
• Hypothesis testing can be used to assist in
comparing these formulations.
• Hypothesis testing allows the comparison to
be made on objective terms, with knowledge
of risks associated with searching the wrong
conclusion
2
Graphical View of the Data
Dot Diagram, Fig. 2-1, pp. 24
• Response variable is a random variable
• Random variable:
1. Discrete
2. continuous
3
Box Plots, Fig. 2-3, pp. 26
• Displays min,
max, lower and
upper quartile,
and the median
• Histogram
4
Probability Distributions
• Probability structure of a Random variable, y, is described by its
probability distribution.
• y is discrete: p(y) is the probability function of y (F2-4a)
• y is continuous: f(y) is the probability density function of y (F24b)
5
Probability Distributions
Properties of probability distributions
• y-discrete:
0  p(y j )  1
all values of y j
P ( y  y j )  p ( y j ) all values of y j

p(y j )  1
all values of y j
f (y )  0
• y-continuous:
b
P (a  y  b )   f ( y )dy
a

 f ( y )dy  1

6
Probability Distributions
mean, variance, and expected values

  yf ( y )dy y continuous
  E ( y )  
  yp ( y ) y discrete
 all y

2
  ( y   ) f ( y )dy y continuous
V ( y )   2  E ( y   ) 2  
  ( y   ) 2 p ( y ) y discrete
 all y
7
Probability Distributions
Basic Properties
1.
2.
3.
4.
5.
6.
E(c) = c
E(y) = 
E(cy) = c E(y) = c 
V(c) = 0
V(y) = 2
V(cy) = c2 V(y) = c2 2
8
Probability Distributions
Basic Properties
• E(y1+y2) = E(y1)+E(y2) = 1+2
V ( y 1  y 2 ) V ( y 1 ) V ( y 2 )  2Cov ( y 1 , y 2 )
• Cov(y1,y2) = E[(y1-1)(y2-2)]
• Covariance: measure of the linear
association between y1 and y2.
• E(y1.y2) = E(y1).E(y2) = 1.2 (y1 and y2 are
indep)
y1
E (y 1)
E( )
y2
E (y 2 )
9
Sampling and Sampling
Distributions
• The objective of statistical inference is to draw conclusions
about a population using a sample from that population.
• Random Sampling: each of N!/(N-n)!n! samples has an
equal probability of being chosen.
• Statistic: any function of observations in a sample that does
not contain unknown parameters.
• Sample mean and sample variance are both statistics.
n
y 
y
i 1
n
n
i
,
S2 
 y
i 1
i
y
2
n 1
10
Properties of sample mean and variance
• Sample mean is a point estimator of population mean 
• Sample variance is a point estimator of population variance
2
• The point estimator should be unbiased. Long run average
should be the parameter that is being estimated.
• An unbiased estimator should have min variance. Min
variance point estimator has a variance that is smaller than
the variance of any other estimator of that parameter.
1
E (S 2 ) 
E (SS ),
E(SS)  ( n  1) 2
n 1
11
Degrees of freedom
• (n-1) in the previous eq is called the NDOF
of the sum of squares.
• NDOF of a sum of squares is equal to the
no. of indep elements in the sum of squares
n
• Because   y i  y   0 , only (n-1) of the n
i 1
elements are indep, implying that SS has (n1) DOF
12
The normal and other sampling
distributions
• Normal Distribution
1
 (1/ 2)[( y   ) /  ]2
f (y ) 
e
,
 2
  y  
• y is distributed normally with mean  and
variance 2 y N (  ,  )
• Standard normal distribution: =0 and 2=1
z
N (0,1)
z 
y 

13
Central Limit Theorem
• If y1, y2, …, yn is a sequence of n indpendent
and identically distributed random variables
with E(yi) =  and V(yi) = 2 (both finite) and
x  n
x = y1+y2+…+yn, then z n 
has an
2
n
approximate N(0,1) distribution in the sense
that, if Fn(z) is the distribution function of zn
and F(z) is the distribution function of the
N(0,1) random variable, then limn  [Fn (z ) / F(z )]  1
14
Chi-Square or c2 distribution
• If z1, z2, …, zn are normally and
independently distributed random variables
with mean 0 and variance 1 NID(0,1), then
the random variable x  z 12  z 22  ...  z 32
follows the chi-square distribution with k
DOF.
1
f (x ) 
2
k /2
k 
 
2
x
( k / 2) 1  x / 2
e
,
x 0
15
Chi-Square or c2 distribution
• The distribution is asymmetric (skewed)
=k
2= 2k
• Appendix III
• F2-6
16
Chi-Square or c2 distribution
• y1, y2, …, yn is a random sample from
N(,), then
n
SS

2

 y
i 1
i

y
2
2
c n21
• SS/2 is distributed as chi-square with n-1
DOF
17
Chi-Square or c2 distribution
• If the observations in the sample are
NID(,), then the distribution of S2 is
 2 /(n  1)  c n21
• Thus, the sampling distribution of the
sample variance is a constant times the chisquare distribution if the population is
normally distributed
18
t distribution with k DOF
• If z and c k2 are indpendent normal and chisquare random variables, the random
variable
z
tk 
c k2 / k
• Follow t distribution with k DOF as
follows:
[(k  1) / 2]
1
f (t ) 
,  t  
2
( k 1) / 2
k  (k / 2) [(t / k )  1]
19
t distribution with k DOF
 = 0 and 2 = k/(k-2) for k>2
• If k=infinity, t becomes standard normal
• If y1, y2, …, yn is a random sample from
N(,), then
y 
tk 
S/ n
is distributed as t with n-1 DOF
20
F distribution
• If c and c are two independent chisquare random variables
with
u
and
v
DOF,
2
c
then the ratio Fu ,v  u2 / u
cv / v
• Follows the F dist with u numerator DOF
and v denominator DOF
2
u
2
v
(u / 2) 1
 u  v  u 
(u / 2) 1

x
 
2  v 

h (x ) 
, 0x 
(u v ) / 2
 u   v  u

       x  1
 2   2  v

21
F distribution
• Two independent normal populations with
common variance 2. If y11, y12, …, y1n1 is a
random sample of n1 observations from 1st
population and y21, y22, …, y2n2 is a random
sample of n2 observations from 2nd
population, then S 12
S
2
2
 Fn 11,n 21
22
The Hypothesis Testing Framework
• Statistical hypothesis testing is a useful
framework for many experimental
situations
• We will use a procedure known as the twosample t-test
23
The Hypothesis Testing Framework
• Sampling from a normal distribution
• Statistical hypotheses: H :   
0
1
2
H1 : 1   2
24
Estimation of Parameters
1 n
y   yi estimates the population mean 
n i 1
n
1
2
2
2
S 
( yi  y ) estimates the variance 

n  1 i 1
25
Summary Statistics (pg. 36)
Formulation 1
Formulation 2
“New recipe”
“Original recipe”
y1  16.76
y 2  17.04
S  0.100
S 22  0.061
S1  0.316
S 2  0.248
n1  10
n 2  10
2
1
26
How the Two-Sample t-Test Works:
Use the sample means to draw inferences about the population means
y1  y2  16.76  17.04  0.28
Difference in sample means
Standard deviation of the difference in sample means
 
2
y
2
n
This suggests a statistic:
Z0 
y1  y2
 12
n1

 22
n2
27
How the Two-Sample t-Test Works:
Use S and S to estimate  and 
2
1
2
2
2
1
The previous ratio becomes
2
2
y1  y2
2
1
2
2
S
S

n1 n2
However, we have the case where     
2
1
2
2
2
Pool the individual sample variances:
(n1  1) S  (n2  1) S
S 
n1  n2  2
2
p
2
1
2
2
28
How the Two-Sample t-Test Works:
The test statistic is
y1  y2
t0 
1 1
Sp

n1 n2
• Values of t0 that are near zero are consistent with the null
hypothesis
• Values of t0 that are very different from zero are consistent
with the alternative hypothesis
• t0 is a “distance” measure-how far apart the averages are
expressed in standard deviation units
• Notice the interpretation of t0 as a signal-to-noise ratio
29
The Two-Sample (Pooled) t-Test
(n1  1) S12  (n2  1) S 22 9(0.100)  9(0.061)
S 

 0.081
n1  n2  2
10  10  2
2
p
S p  0.284
t0 
y1  y2
16.76  17.04

 2.20
1 1
1 1
Sp

0.284

n1 n2
10 10
The two sample means are a little over two standard deviations apart
Is this a "large" difference?
30
The Two-Sample (Pooled) t-Test
• So far, we haven’t really
done any “statistics”
• We need an objective
basis for deciding how
large the test statistic t0
really is
• In 1908, W. S. Gosset
derived the reference
distribution for t0 …
called the t distribution
• Tables of the t
distribution - text, page
606
t0 = -2.20
31
The Two-Sample (Pooled) t-Test
• A value of t0 between
–2.101 and 2.101 is
consistent with
equality of means
• It is possible for the
means to be equal and
t0 to exceed either
2.101 or –2.101, but it
would be a “rare
event” … leads to the
conclusion that the
means are different
• Could also use the
P-value approach
t0 = -2.20
32
Use of P-value in Hypothesis
testing
• P-value: smallest level of significance that
would lead to rejection of the null
hypothesis Ho
• It is customary to call the test statistic
significant when Ho is rejected. Therefore,
the P-value is the smallest level a at which
the data are significant.
33
The Two-Sample (Pooled) t-Test
t0 = -2.20
• The P-value is the risk of wrongly rejecting the null
hypothesis of equal means (it measures rareness of the event)
• The P-value in our problem is P = 0.042
34
Minitab Two-Sample t-Test Results
35
Checking Assumptions –
The Normal Probability Plot
• Assumptions
1.
2.
Equal variance
Normallity
• Procedure:
1.
2.
3.
Rank the observations in the sample in an ascending order.
Plot ordered observations vs observed commulative frequency (j0.5)/n
If the plotted points deviate significantlly from straight line, the
hypothesized model in not apporpriate.
36
Checking Assumptions –
The Normal Probability Plot
37
Checking Assumptions –
The Normal Probability Plot
• The mean is estimated as the 50th percentile on the
probability plot.
• The standard deviation is estimated as the
differnce between the 84th and 50th percentiles.
• The assumption of equal population variances is
simply verified by comparing the slopes of the two
straight lines in F2-11.
• Will use t-tests without extensive concern about
the assumption of normallity
38
Importance of the t-Test
• Provides an objective framework for simple
comparative experiments
• Could be used to test all relevant hypotheses
in a two-level factorial design, because all
of these hypotheses involve the mean
response at one “side” of the cube versus
the mean response at the opposite “side” of
the cube
39
Confidence Intervals (See pg. 43)
• Hypothesis testing gives an objective statement
concerning the difference in means, but it doesn’t
specify “how different” they are
• General form of a confidence interval
L    U where P( L    U )  1  a
• The 100(1- α)% confidence interval on the
difference in two means:
y1  y2  ta / 2,n1  n2 2 S p (1/ n1 )  (1/ n2 )  1  2 
y1  y2  ta / 2,n1  n2 2 S p (1/ n1 )  (1/ n2 )
40
Hypothesis testing
 12   22
• The test statitic becomes
to 
y1  y 2
S 12 S 22

n1 n 2
• This statistic is not distributed exactly as t.
• The distribution of to is well approximated2
2
2
S
/
n

S
by t if we use

1
1
2 / n2 
v 
2
2
2
2
as the DOF
S 1 / n1  S 2 / n2 
n1  1

n2 1
41
Hypothesis
testing
2
2
 1 and  2 are known
• The test statitic becomes
zo 
y1  y 2

2
1
n1


2
2
n2
• If both populations are normal, or if the sample sizes are
large enough, the distribution of zo is N(0,1) if the null
hypothesis is true. Thus, the critical region would be found
using the normal distribution rather than the t.
• We would reject Ho, if z o  z a / 2 where za/2 is the upper
a/2 percentage point of the standard normal distribution
42
Hypothesis testing
 12 and  22 are known
• The 100(1-a) percent confidence interval:
y1  y 2  z a /2
 12
n1

 22
n2
 1  2  y 1  y 2  z a / 2
 12
n1

 22
n2
43
Hypothesis testing
Comparing a single mean to a specified value
• The hypothesises are: Ho:  = o and H1:  ≠ o
• If the population is normal with known variance, or if the
population is non-normal but the sample size is large
enough, then the hypothesis may be tested by direct
application of the normal distribution.
zo 
y  o
• Test statistic
/ n
• If Ho is true, then the distribution of zo is N(0,1).
Therefore, Ho is rejected if z  z
o
a /2
44
Hypothesis testing
Comparing a single mean to a specified value
The value of o is usually determined in one
of three ways:
1. From past evidence, knowledge, or
experimentation
2. The result of some theory or model describing
the situation under study
3. The result of contractual specifications
45
Hypothesis testing
Comparing a single mean to a specified value
• If the variance of the population is known, we assume that
the population is normally distributed.
• Test statistic t  y  o
o
S/ n
• Ho is rejected if t o  t a / 2,n 1
• The 100(1-a) percent confidence interval
y  t a / 2,n 1
S
S
   y  t a / 2,n 1
n
n
46
The paired comparison problem
The two tip hardness experiment
 i  1, 2
Statistical model y ij  i   j   ij 
j  1, 2,...,10

jth paired difference d j  y 1 j  y 2 j
•
•
• Expected value of paired difference d  1  2
• Testing hypothesis: Ho: d=0 and H1: d≠0
1/ 2
n


• Test statistic:
d d


j

d
1
j 1
to 
; d  d j ; S d  
n j 1
n 1

Sd / n


n





47
The paired comparison problem
The two tip hardness experiment
• Randomized block design
• Block: homogenous experimental unit
• The block represents a restriction on complete
randomization because the treatment combinations are
only randomized within the block
Randomized Block
Complete Randomization
DOF
n-1 = 9
2n-2 = 18
Standard deviation
Sd = 1.2
Sp = 2.32
Confidence interval on -0.1±0.86
1-2
-0.1±2.18
48
Inferences about the variablity of
normal distributions
• Ho: 2 = o and H1: 2 ≠ o
n
2
• Test statistic
 y i  y 
c 
2
o
SS

i 1
 o2
 o2 2
• Ho is rejected if c 2  c 2
or co  c12a .2,n 1
o
a .2, n 1
• The 100(1-a) percent confidence interval
(n  1)S 2
ca2 / 2,n 1
2 
(n  1)S 2
c12a / 2,n 1
49
Inferences about the variablity
of normal distributions
H o :  12   22 and H 2 :  12   22
• Test statistic
S 12
Fo  2
S2
• Ho is rejected if Fo  Fa .2,n11,n 21 or Fo  F1a .2,n11,n 21
50