chpter2 - UniMAP Portal

Download Report

Transcript chpter2 - UniMAP Portal

CHAPTER 2
Building Empirical Model
Basic Statistical Concepts
Consider this situation:
The tension bond strength of portland cement
mortar is an important characteristics of the product.
An engineer is interested in comparing the strength
of a modified formulation in which polymer latex
emulsion have been added during mixing to the
strength of unmodified mortar. The experimenter has
collected observations on the strength, 10 each for
both mortars. The data are shown in Table 2.1
 Each observations,j is called a run
 Fluctuation (noise) – experimental error
 Presence of error implies that response variable is a random variable (can
be discreate or continuous)
Dot diagram for data in Table 2.1
What can you conclude from the dot diagram?
Where is the general location or central tendency?
Other graphical methods…
Histogram
•For fairly numerous data
Other graphical methods…
Box plot (or box and whisker plot)
Upper quartiles (75%)
median
lower quartiles (25%)
Probability Distributions
• The probability structure of a random
variable, y is described by its probability
distributions.
• If y is discrete – the probability function of y,
p(y)
• If y is continuous – the probability density
function, f(y)
Mean, Variance and Expected value
• Mean,μ of a probability distribution is a measure of its central tendency or
location
• We may also express the mean in terms of expected value of random
variable, y
Where E denotes the expected value operator
• The variability or dispersion of a probability distribution can be measured
by the variance, defined as
• Note that the variance can be expressed entirely in terms of expectation
because
• Finally the variance is used so extensively that it is convenient to define a
variance operator, V such that
Elementary results
• If y is a random variable with mean μ and variance σ2 and c is
a constant, then:
 Covariance is a measure of the linear association between y1 and y2. If y1
and y2 are independent, then Cov(y1,y2)=0.
We may also show that,
Inferences About Differences In
Means, Randomized Design
•
•
•
•
•
•
Hypothesis testing
Choice of sample size
Confidence intervals
The case where σ12≠ σ22
The case where σ12 and σ22 are known
Comparing a single mean to specified value
Hypothesis testing
• Lets reconsider the portland cement experiment.
• In general, we can consider 2 formulations (unmodified and
modified mortar) involved as 2 level of the factor
formulations.
• Let y11,y12,y13,…y1n1 represent the n1 observations from the
first factor level, whereas y21,y22,y23,…y1n1 represent the n2
observations from the second factor level.
• We describe the results of experiment with a model. A simple
statistical model:
y= j observation from factor level i
μ= mean of response
ε = normal random variable
= random error
1) Statistical hypothesis
 Is a statement either about the parameters of a probability
distribution or the parameters of a model.
 Decision-making procedure about hypothesis is called hypothesis
testing.
 For example, in the portland cement experiment, we may think
that the mean tension bond strengths of two mortar formulation
are equal. This may stated formally as:
Power = the probability of rejecting null hypothesis, H0 when the
alternative hypothesis, H1 is true.
2) The two-sample t-Test
• The appropriate test statistic to use for comparing two
treatment mean in completely randomized design is
Where :
y is sample mean
n is sample size
S2p is estimate of common var iance
S12 and S22 are individual sample variances
• To determine whether to reject H0:μ1=μ2, we would
compare t0 to the t distribution with n+n-2 degrees
of freedom.
• If t 0  t  / 2,n  n  2, where t 0  t  / 2,n  n  2, is the upper α/2
percentage point of t distribution with n1+n2-2
degrees of freedom, we would reject H0 and
conclude that the mean strength of two formulation
of portland cement differ.
• This test procedure is called two-sample t-test
• For one sided alternative hypothesis H1:μ1>μ2, H0
would be rejected if t 0  t  / 2,n  n  2,
• For H1:μ1<μ2, H0 would be rejected if t 0  t  / 2,n  n 2,
Example:
From the portland cement data,
3) P-values
• One way to report the results of a hypothesis test is
to state that the null hypothesis was or was not
rejected at specified α-value or level of confidence.
• For example; in portland cement mortar formulation,
we can say that H0:μ1=μ2 was rejected at 0.05 level of
confidence.
• This is inadequate conclusion because no idea exact
location of the computed value in rejection region.
Moreover, some decision maker might be
uncomfortable with α=0.05.
• To overcome this difficulties
P-value approach
•
• P-value is the smallest level of significance that
would lead to rejection of null hypothesis.
• P-value: Smallest level α at which data are significant.
Therefore, can determine significance of data.
• It is not easy to compute exact P-value. However,
approximation can be done. For portland cement
mortar example, degree of freedom=18. From tdistribution table, the smallest tail area probability is
0.0005, for which t0.0005,18 = 3.922
• Now t 0  9.13  3.922 (H0 is rejected), so because the
alternative hypothesis is two-sided, P-values must be
less than 2(0.0005)= 0.001.
4) Normal probability plot
Is a graphical technique for determining whether sample data conform to
hypothesized distribution based on subjective visual exam of data.
How to interpret?
How to construct?? (j-0.5)/n, where j=1,2,3….n
Choice of sample size
• The choice of sample size and probability of type II error, β are
closely related.
• Suppose we are testing
And that the means, μ are not equal. Because H0:μ1=μ2 is not
true, we are concerned about wrongly failing to reject H0.
• β depends on true difference in mean,δ
• Graph β vs δ is called the operating characteristic curve or
O.C. curve.
• Generally, β error decreases as the sample size increases. So,
δ is easier to detect in bigger sample size.
Example of O.C curve for the case where σ1 and σ2 are
unknown but equal, and α= 0.05
d  1   2 / 2
n*  2n  1
d
From the curve;
The greater the difference in mean, the smaller β error
As the sample size increases, β gets smaller
• How to use the O.C curve to calculate sample
size?
• Suppose that δ=0.1, therefore,
0 .1
d  1   2 / 2   / 2 
2
• If σ = 0.25, then d= 0.2.
• If we want to reject the null hypothesis 95% of
the time when μ1-μ2=0.1, then β=0.05 and
d=0.2 yields n*=15
• Since n*  2n  1 , therefore n = 8
Confidence intervals
 an interval within which the value of parameter or parameters in question
would be expected to lie.
L and U are called lower and upper confidence limits.
1-α is called confidence coefficient. If α = 0.05, Equation 8.29 is called a 95%
confidence interval for μ.
How to calculate confidence interval?
y1  y 2  t  / 2, n1 n 2  2S p
1
1

 1   2 
n1 n 2
y1  y 2  t  / 2, n1 n 2  2S p
1
1

n1 n 2
is a 100(1-α) percent confidence interval for μ1-μ2.
Example
y1  y 2  t  / 2, n1 n 2  2S p
1
1

 1   2 
n1 n 2
y1  y 2  t  / 2, n1 n 2  2S p
1
1

n1 n 2
From portland cement mortar example discuss earlier; the actual 95%
confidence .interval estimate for difference in mean tension strength,
16 .76  17 .92  ( 2.101)0.284
1
1

 1   2 
10 10
1
1

10 10
 1.16  0.27  1   2  1.16  0.27
16 .76  17 .92  ( 2.101)0.284
 1.43  1   2  1.43
Thus the confidence interval is μ1-μ2 = -1.16 kgf/cm2 ±0.27 kgf/cm2
Or the difference in mean strength is -1.16 and the accuracy of this
estimate is ±0.27 kgf/cm2
The case where σ12≠ σ22
If we are testing,
And cannot assume the variances are equal, the test statistic becomes
With calculation of degree of freedom as follows
The case where σ12 and σ22 are known
If both variances are known, then the hypothesis
Comparing a single mean to specified value
If we are testing,
The test statistics,
The confidence interval,
SUMMARY
Regression model & Empirical model
• Suppose there is a single dependent variable or
response,y that depends on k independent or
regressor variables, for example x1,x2,x3,…xk
• The relationship between y and k is characterized by
mathematical model called a regression model.
• Regression model is the basis of empirical model
(created from experimental observations)
Linear Regression Model
• Suppose we wish to develop an empirical model which relates
viscosity of polymer to the temperature, x1 and catalyst feed
rate,x2
y  0  1x1  2 x 2  
•
•
•
•
This is multiple linear regression model. Why?
β =regression coefficient
x =predictor variables or regressor
In general, any regression model that is linear in parameters is
a linear regression model, regardless of the surface that is
generated (normally related to model with interaction) .
• Methods for estimating parameters in multiple linear
regression is called model fitting.
• Typical method is method of least squares
Least squares estimation of the parameters
Matrix Approach To Multiple Linear Regression
Properties of the least squares estimators and
estimation of σ2
Hypothesis Testing In Multiple
Regression
• Test for significance of regression
• Test on individual regression coefficients and
groups of coefficients
Test for significance of regression
Test on individual regression coefficients
and groups of coefficients
Confidence interval in multiple
regression
• On individual regression coefficient
• On the mean response
Confidence interval in multiple regressionOn individual regression coefficient
Confidence interval in multiple regressionOn the mean response
Thank you…