Transcript First Wave

Introduction to design
Olav M. Kvalheim
Content
• Making your data work twice
• Effect of correlation on data interpretation
• Effect of interaction on data interpretation
Chemometrics/Infometrics
Design of information-rich
experiments and use of
multivariate methods for extraction
of maximum relevant information
from data
Making your data work twice
What is Information?
•A - mean value, no
standard deviation given
B
C
A
•B - mean value with
standard deviation
given, large value of
stand. dev.
•C - mean value, low
standard deviation
Measurement strategy?
A
B
Unknowns
Calibration Weights
Hotelling (1944) Ann. Math. Statistics 15, 297-306
The univariate weighing design
Weigh A and B separately
mA ± A
mB ± B
 A =  B= 
Precision is  for both A and B
The multivariate design
Weigh A and B jointly to determine sum and difference:
mA+ mB =S

mA = ½S + ½D
mB = ½S - ½D
mA- mB =D
A B 
1 2 1 2
1
   
  0.7
4
4
2
Precision
for S
Precision
for D
Precision is 0.7  for both A and B
Univariate vs Bivariate strategy
Precision in mAand mB
Univariate Design
Bivariate Design

0.7
Precision is improved by 30% by using a
multivariate design with the same number of
measurementsas for the univariate!
Univariate vs Multivariate
weighing
With N masses to weigh, a multivariate design provides an
1
estimate of each mass with a precision

N
The larger the number of unknowns, the larger the gain in
precision using a multivariate weighing design.
Effect of correlation on data
interpretation
X1  X 2
Example
• Process output is function of temperature
and amount of catalyst
Correlation between amount of
catalyst and amount produced
• Strong positive
correspondence
Correlation between Temperature
and Produced amount
• Weak positive
correspondence
Conclusion from correlation
analysis
• Increase amount of catalyst and temperature
to increase production
Result of test
• Produced amount was lowered!
Bivariate Regression Model
• Produced amount = 300
•
+ 2.0 * Catalyst
•
- 0.5 * Temperature
Correlation between temperature
and amount of catalyst
• Strong positive
correspondence
Solution to correlation problem
• Multivariate Design - Change many process
variables simultaneously according to
experimental designs
Effect of interaction on data
interpretation
X1X2
The task
The yield of a chemical reaction is a function
of temperature (t) and concentration (c).
y = f (t,c)
Optimise the yield for the reaction!
Response surface in the presence
of interaction
Temperature, ºC
170
70 60 50 45 40
75
160
150
140
0.1
0.2
Concentration, M
Efficiency of information
extraction
Information
Multivariate
design
Univariate
design
(COST)
Number of experiments
Multivariate Design
vs.
Univariate Design
• Correct Models Possible (Interactions)
• Efficient Experimentation
• Improved Precision/Information quality