Transcript Document

Statistics for Experimenters
Tom Horgan
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means – electronic, mechanical, photocopying,
recording or otherwise – without the permission of Plug Power Inc. COMPANY CONFIDENTIAL Copyright 2003 by Plug Power Inc.
Statistics for Experimenters
Goals: At the end of the course you should be able to...
 Understand the real purpose of statistical analysis
 Appreciate the importance of data & experiment
integrity in an analysis
 Recognize what types of analyses are appropriate for
a particular situation
 Recognize and interpret basic statistical parameters
 Recognize and avoid common decision traps
 Use statistics to support data-based decisions
Statistics for Experimenters
Outline
 The Statistical Analysis Paradigm
 Assumptions, Data Integrity and Experimental Design
 Data Types and Effects
 Gage R&R
 Errors, replication and sample size
 Decision Traps
 Hypothesis Testing & probability distributions
 Statistical Situations with Examples
The Statistical Analysis Paradigm
Why do we experiment?
The Statistical Analysis Paradigm
Probability and Statistics are
used to draw inferences about
populations
The Statistical Analysis Paradigm
 Probability and Statistics are used to draw inferences about
populations
These inferences are made by creating hypotheses, sampling the
population(s) and calculating the necessary statistics
The inferences may be about a single population, comparisons between two
populations, or relationships between multiple populations
A statistical inference is always qualified with a probability that expresses the
degree of confidence we have in it
The probabilities are based on the theoretical behavior of the error inherent in
the population variable (probability distribution)
Increasing the sample size improves the quality of the inference by decreasing
the error of the estimates
The Statistical Analysis Paradigm
 Population: A collection, real or theoretical, of all
individual objects of a particular type.
• The collection of all stacks run at 5 degrees subsaturatation
• The collection of all plates made from a prototype mold,
which was subsequently destroyed
• The collection of all SU1 systems manufactured by Plug
Power
 When reviewing data or analyses, try to define the
population that is being evaluated.
Assumptions, Data Integrity and Experimental Design:
The Fundamental Statistical Assumption
We have drawn unbiased, random samples.
Assumptions, Data Integrity and Experimental Design
 When analysis is confusing or experiments are inconclusive,
most often it is because this assumption is violated
 How is it violated? Many ways...
‘Untreated’ Noises, Gage Issues, Confounded Variables,
Uncalibrated gages, Poorly tagged data, etc, etc.
 Examples of Bias (Sampling and Measurement):
“That gage can’t read past to 250 so I just wrote down the max”
“I don’t think the gage is accurate at low readings”
“Some time during the run Joe turned up the gas temp”
“MEA1 was tested on station 3 and MEA2 was tested on station 6”
“Most were made with lot 1 but a few were made with lot 2, I think”
Assumptions, Data Integrity and Experimental Design
 Experiment Design Integrity:
• Have we unwittingly confounded variables?
• Have we randomized the runs and blocked out any nuisance
factors?
• Have we spaced the levels sufficiently to detect the effects
we’re interested in?
• Do we suspect that some factors interact and will I be able to
detect it with my experiment?
• If we are model building, have I chosen an experiment that will
minimize error and will allow me to predict with equal accuracy
in all directions (orthogonal/rotatable)?
Assumptions, Data Integrity and Experimental Design
 Experiment Design Integrity: Orthogonality
• Orthogonal Array for a Screening Experiment (2k)
45.8
60.0
28.4
74.2
-4.9
55.1
51.1
8.0
59.1
Factorial and Fractioned Factorial Screening
Experiments Are Orthogonal up to Two-way Interactions
or Specified Combinations of Them
Assumptions, Data Integrity and Experimental Design
• Orthogonality in Taguchi Orthogonal Arrays
Design Number 4: L8, 7 factors (2-7)
Effect of B when
A = 1: Is it B or
is it C?
Taguchi Orthogonal Arrays in General are not
Orthogonal for Two-Way Interactions
Assumptions, Data Integrity and Experimental Design
Interactions
• If interactions exist between the factors of an orthogonal array interpretation
of the results can be misleading and the results may not confirm.
• The factors should be examined before the RD experiment and judgments
should be made as to which pairs are likely to interact (this often requires
extra testing).
• The interactions should be characterized so that appropriate level
combinations can be selected
Acid
Uptake
75
1
2
Time
50
3hr
25C
Data Types and Effects
 Variables Data:
• Normal Random Variables: Variables whose population is normally, or
approximately normally distributed.
– Most variables can safely be assumed to be normally distributed (population)
– Cell Voltage, Temperature, Pressure, Thickness, Tensile Strength, etc.
• Time-To-Failure: These data are not normally distributed and require different
analysis methods.
 Attributes Data:
• Binomial Random Variables: This is Pass/Fail data. Frequently expressed in
terms of ‘percent acceptable‘ .
• Categorical Data: Data which has a discreet number of responses. For example
survey data.
Error, Replication and Sample Size
 Replication:
• No matter how expensive or difficult, we need replicates to determine
confidence levels in the inferences we want to make
• Time-series points are not replicates
• Better none than one
• With some historical estimate, in a pinch replicates can be simulated
• Good news: increasing replicates tends to mitigate the effects of
violating the assumptions (Central Limit Theorem).
Error, Replication and Sample Size
 Sample Size Determination:
• In analyzing an experiment, there are two possibilities:
– The effects are small but statistically significant (too many samples)
– The effects are large but statistically not significant (not enough samples)
• With a rough estimate of the population standard deviation and some idea
of size of the effect you are interested in, you can estimate the sample size
required to detect that effect with some probability
• Moral: Spend some time thinking about how large of an effect you are
interested in detecting before running the experiment, and estimate an
appropriate sample size
Error, Replication and Sample Size
 Sample Size Determination:
• Variables data requires far less samples for significance than attributes
data. Example:
– I’m testing a new membrane and I will start life testing if there is an increase
of at least 15 mv at 0.6 A/cm2 over the specification. Historically, standard
deviation is around 0.08 mv. How many samples at (95% ‘Confidence’)?
6 Samples
– My my meter isn't working and all I have is a light that that goes on if the
reading is more than 15mv over the spec. How many stacks would I have to
test to be 95% sure that I’ll exceed the specification 90% of the time (no
defects)?
58 Samples
Error, Replication and Sample Size
 Going forward we are talking about the random error. We
cannot quantify error associated with violation of the
assumption.
 Type I Error:
•
•
•
•
The probability you are wrong if you claim the effect is real.
Aka: P-Value, P, Significance level, consumers risk, alpha
1- P-Value = Confidence
You find this somewhere in most statistical analyses. First thing to look for.
Two ways to present Type I Error:
– Calculate the effect and present the probability
– Fix the probability and calculate the range within which the population value will fall
with that probability (confidence interval)
Error, Replication and Sample Size
 Type II Error:
•
•
•
•
The probability you are wrong if you claim the effect is not real
Aka: Beta, producers risk
1 - Beta = Power
Used as the basis for sample size determination. Not usually listed
unless specifically requested.
 Alpha error is considered the more serious of the two because it is
more likely to lead to a bad decision. Beta errors generally lead to
no decision.
 Increasing sample sizes reduces both errors (or increases
confidence and power)
Gage Repeatibility and Reproducibility
 Why?
• How else can you know if the effects you observe are real?
 What?
• It’s the error of the estimate. All observed effects must be larger than that
error to be considered significant
• Repeatability: The error associated with repeat measurements of the same
object(s), with the same device and operator.
• Reproducibility: The error associated with measurements taken across
multiple devices and operators.
 What do I get out of it?
• The ability to estimate the sample size required to detect effects of the desired
size
Decision Traps
 Interpreting inferential statistical analyses is often not intuitive
and there are several ‘traps’ that lead to poor decisions. A few
are listed here*
- Probabilistic Certainty: Are you certain that you are 95% confident?
- "Bush leads Gore in Ohio, 57% to 43%. The margin of error in this poll is +/- 3%.”
- Statistical Significance is not causal significance. Storks and Sun spots
- Small Sample Bias: People assume that small samples should exhibit
average behavior and assume something is wrong when they don’t.
- Linear Thinking: People do not conceptualize non-linearities and
interactions well and tend to ignore them.
*Study by Miller, Tversky, Kahnmen - From Decision Sciences, Kliendorfer, Kunreuther, Schoemaker, Cambridge University Press, 1993
Decision Traps
 More traps...
- Cue Learning: People have difficulty learning new relationships when previous
knowledge exists.
- Perceptual Limitations: People can manage on average 7 +/- 2 discreet bits of
information simultaneously.
- Noise: Peoples ability to evaluate data deteriorates as noise is introduced.
- Ignoring base rates: Try to answer this quickly...
About 1% of the population of a small city (100,000) has AIDS. There is a new test that detects AIDS with
about 5% false negatives and 10% false positives. Suppose a randomly selected person tests positive for
AIDS. What is the probability that he actually has it?
*Study by Miller, Tversky, Kahnmen - From Decision Sciences, Kliendorfer, Kunreuther, Schoemaker, Cambridge University Press, 1993
Decision Traps
 Just a few more...
• Overweighing Concrete Data: People give more weight to speculative concrete
data and less weight to accurate statistical data
• Familiarity Availability: Recency tendency
• Misconceptions of Chance:
– Statistical Runs - “Had four heads in a row, a tail is due.”
– Sample Size Insensitivity - small sample bias; large samples which
deviate, hold more weight than small ones
– Regression Towards the Mean - An unusual result is more likely to be
followed by a result closer to the mean; Sophomore Jinx
– Combinatorial Bias: In a group of 50 people, what is the probability that
at least two will have the same birthday?
*Study by Miller, Tversky, Kahnmen - From Decision Sciences, Kliendorfer, Kunreuther, Schoemaker, Cambridge University Press, 1993
Statistics for Experimenters
 Day 1 Summary
Consider the Statistical Analysis Paradigm when evaluating data
Drawing Inferences about populations
Don’t violate the assumptions - Bad data is worse than no data
Data/Experiment Integrity (orthogonality) is by far the most important
factor in assuring successful experiments
Replicate: Larger Samples = Higher Confidence,
1 Sample = No Confidence
Calculate the sample size ahead of time
Type I Error: Probability I’m wrong if I say it’s real - Look for it
P-Value, P, Significance, alpha, consumers risk: 1 - P-Value = Confidence
Decision Traps: Easy to get it wrong
Hypothesis Testing:
 Nomenclature:
• Averages:
– Population: 
Sample:
x
• Standard Deviations:
– Population:

Sample:
S
2
Sample:
S2
Sample:
b
• Variances:
– Population:
• Regression Coefficients:
– Population:

Hypothesis Testing:
 We set up hypotheses to answer the question, “Is the effect
statistically significant?” …or… “Is the effect I observe real or is it
just noise?”
 Though it may not be obvious, all statistics can be described in
terms of a test of some hypothesis.
 By convention the hypothesis is always “The effect is not
statistically significant.” This is called the ‘null’ hypothesis and the
statistical test is set up to reject it.
 The alternative hypothesis may be different depending on the
purpose of the test.
 What effect have we made the hypothesis about?
Setting Up and Testing a Hypothesis
 Example 1: Comparing Two Population Means - “Is there a
significant difference between the two means?”
• The MEA manufacturing process has been changed and five stacks
were tested both before and after the change. Is the mean-cell-voltage
affected?
• Null Hypothesis:
• Alternative Hypotheses:
Setting Up and Testing a Hypothesis
 Example 2: Evaluating an Experiment - “Are there significant
effects?”
• A gage repeatability and reproducibility study was conducted where
three operators tested three identical stacks on each of three different
test stations (27 total tests). Were there any significant effects?
• Null Hypotheses:
• Alternative Hypotheses:
Setting Up and Testing a Hypothesis
 Example 3: Evaluating a Regression Analysis - “Is there a
significant degradation rate?”
• A module was tested on a test station for two months? Is there a
significant degradation rate?
• Null Hypotheses:
• Alternative Hypotheses:
Setting Up and Testing a Hypothesis
 Example 3: Evaluating a Multiple Regression Analysis - “Is there a
significant relationship between the factors and the response?”
• A response surface experiment was conducted to model the
relationship between O2Ch4 ratio, H2 Stoichs and ATO Temp and
Reformer Efficiency resulting in the following model. Is there a
significant relationship?
RefEff = -2.24 + 13.3(O2CH4) - 3.48(H2Stoich) + 2.97E-03(ATOTemp) - 9.42(O2CH4)² +
0.439(H2Stoich)² - 1.24E-06(ATOTemp)² + 1.76(O2CH4)(H2Stoich) - 5.69E-03(O2CH4)(ATOTemp)
+ 1.81E-03(H2Stoich)(ATOTemp)
• Null Hypotheses:
• Alternative Hypotheses:
Probability Distributions
 Probability distributions are functions that describe the random
behavior of variables.
 Both random variables (raw data) and resultant statistics (means,
variances, etc) have probability distributions.
 Most variables behave according to the normal distribution but not
all.
 The expected value (mean) of all probability distributions are
normally distributed - The central limit theorem.
Probability Distributions
 Probability Distributions have key parameters that define them. If
we know what distribution we are dealing with, we can get
estimates of the key parameters, define the distribution and use it to
draw inferences about population behavior.
 We do this by setting the total area under the distribution curve
equal to 1 and calculating the portion of the area associated with
whatever inference we are trying to make.
 The portion of the area is the probability associated with the
inference.
12
4
3
2
4
1
2
0
0
0.
61
02
5
0.
61
00
14
0.
60
98
0.
60
83
0.
60
85
0.
60
86
0.
60
88
0.
60
89
0.
60
91
0.
60
92
0.
60
94
0.
60
95
0.
60
97
0.
60
98
0.
61
00
0.
61
01
N=20
6
0.
60
96
0.
60
94
0.
60
92
0.
60
90
0.
60
88
0.
60
86
0.
60
84
0.
60
82
0.
60
85
0.
60
86
0.
60
88
0.
60
89
0.
60
91
0.
60
92
0.
60
94
0.
60
95
0.
60
97
0.
60
98
0.
61
00
0.
61
01
Probability Distributions
 Normal Distribution estimates. Same data, different sample sizes:
N=50
10
8
6
25
N=101
20
15
10
5
0
Probability Distributions
 Normal Distribution estimates. ‘Normalized’ Curves:
25
20
N=20
15
N=50
10
N=101
5
0
0.6080
0.6085
0.6090
0.6095
0.6100
0.6105
0.6110
Probability Distributions
25
20
15
10
5
0
0.6085
0.6087
0.6089
0.6091
0.6093
0.6095
0.6097
0.6099
0.
61
1
0.
61
06
0
0.
60
98
0.
61
02
0
0.
60
9
0.
60
94
5
0.
60
86
5
0.
60
7
0.
60
74
10
0.
61
1
10
0.
61
06
15
0.
60
98
0.
61
02
15
0.
60
9
0.
60
94
20
0.
60
86
20
0.
60
78
0.
60
82
25
0.
60
7
0.
60
74
25
0.
60
78
0.
60
82
 Comparing two populations (Means and 95% Confidence Intervals):
Probability Distributions
 Probability Distribution Functions
Normal: f
T:
( x) 
1
2

e
( x )2
2 2
 n  1

2 ( n 1) / 2
2   x 

1  
f ( x) 
n
1/ 2  n  
n   
2
F:
1

  (n1  n 2) n1n1/ 2 n 2 n 2 / 2


x ( n1 / 2)1
2




f ( x) 
( n1 n 2 ) / 2 

1  1 
(
n
1 x  n 2)


 n1  n 2 
2  2 
Weibull:
f ( x) 
  1  x /  
x e


Excel Examples
 Example1, Situation 1: Comparing Two Population Means
• Example: You have tested stacks from two different manufacturing
processes and you want to know if the process differences effect
performance.
• Method: Calculate the two sample means and take the difference. Test
the hypothesis that the difference is zero.
• Analyses Techniques:
– (1) Use a t-test for sample sizes less than 30. Use a z-test (normal
distribution) for sample sizes greater than 30.
– (2) Calculate the 95% confidence interval of each mean and plot to
compare the differences.
Excel Examples
 Example 1, Situation 2: Comparing Two Population Standard
Deviations
• Example: For the same process change you want to know if stack-tostack variability is effected.
• Method: Calculate the variance of each group and take the ratio of the
two. Test the hypothesis that the ratio is equal to 1.
– Memory Jogger: Variance = (Standard Deviation)2
• Analyses Techniques:
– Use the F-test, (confidence intervals)
Statistical Situations
 Example 3: Comparing Multiple Population Means - “Is there a
significant difference between the means?”
• Does kilowatt level affect the amount of CH4 (%) remaining in the gas
stream after the ATR catalyst?
• Method: We compare the variability between the experimental group
means to the variability within the groups (the error). We are testing
the hypothesis that all of the effects are zero.
• Analyses: Analysis of variance (ANOVA), f-test, confidence intervals,
(residuals)
• Conclusions/Comments:
Statistical Situations
 Example 4: Evaluating a Regression Analysis - “Is there a
significant linear relationship between the variables?”
• Does kilowatt level affect the amount of CH4 (%) remaining in the gas
stream after the ATR catalyst?
• Method: We evaluate the slope of the line. We are testing the
hypothesis that the slope is zero.
• Analyses: Analysis of variance (ANOVA), f-test, confidence intervals,
(residuals)
• Conclusions/Comments:
Statistical Situations
 Example 5: Evaluating a Screening Experiment - “Are the effects
significant?”
• What are the effects of changes in Anode Stoichs, Cathode Stoichs and
Anode Tin on Mean Cell Voltage in this experiment?
• Method: We use ANOVA to evaluate the significance of the effects and
test the hypothesis that all of the factor effects are zero.
• Analyses: Analysis of variance (ANOVA), f-test, effect plots, (residuals)
• Conclusions/Comments:
Statistical Situations
 Example 6: Evaluating a Multiple Regression Analysis - “What is
the relationship between these factors and that response?”
• What is the relationship between O2/CH4 , H2 Stoichs and ATO Temp on
reformer hydrogen utilization?
• Method: We do a least-squares fit of the factors to the response data to
come up with coefficients to use in a function. We use t-tests to test the
hypotheses that individual coefficients are equal to zero. We use ANOVA to
test the hypothesis that all of the coefficients are equal to zero.
• Analyses: ANOVA, f-test, t-tests, response plots, (residuals analysis)
• Conclusions/Comments:
Statistical Decision Support
 Summary
Consider the Statistical Analysis Paradigm when evaluating data
Drawing Inferences about populations
Don’t violate the assumptions - Bad data is worse than no data
Data/Experiment Integrity is by far the most important factor in
assuring successful experiments
Replicate: Larger Samples = Higher Confidence,
1 Sample = No Confidence
Calculate the sample size ahead of time
Type I Error: Probability I’m wrong if I say it’s real - Look for it
P-Value, P, Significance, alpha, consumers risk: 1 - P-Value = Confidence
Decision Traps: Easy to get it wrong