Transcript Chapter 4

Chapter 4
Statistics
Statistics

“There are three kinds of lies: lies, damn lies
and statistics”


Benjamin Disraeli
“only if the statistics are used improperly”

Tiny Grant, 1978
Statistics


Is really just about being able to tell
differences in analyses, methods when error
or regular variation is taken into account.
Error can be in the sample.


Differences in a process or in a sample
Or in the data collection. (Random error)
Lets Look at a Difference in a
Manufacturing Process

Variation in the life of light bulbs.


What can cause differences.
Let’s carry out an thought experiment.
Light Bulb Lifetime


Take a large set of light bulbs and measure
the life times. Our variation in life will be
based on differences in the bulb. We can
measure time to much greater accuracy than
needed here.
We will look at an example data set -
Let’s describe this data numerically

Taking this data we can calculate:

A mean

A standard deviation

This data will usually present as a normal
distribution.


x = 845.2 hrs
s = 94.2 hrs
Mean
x
x
i
i
n
Standard Deviation
s
 (x  x)
i
i
n 1
2
Standard Deviation (Alternate formula)
What is gained by this formula??
s
n x  ( xi )
2
i
n(n  1)
2
Variance



The variance is the standard deviation squared.
For this data set it would be 8873.64 hr2
Why bother??
When examining error the variance of all the steps
in a process are additive.
2
total
s
 s
2
i
Why then, would we not use this?

The units on standard deviation are the same
as our unit of measure and make physical
sense.
The greater the variability of the data set
the larger the standard deviation will be.
This Normal Curve can tell us a lot.

The area under the curve is normalized.




That is - made to equal 1.
Now the fraction of the area can represent a
probability.
For example half the sample will have a life
less than the mean. (Area = 0.500)
These areas are available in charts.
Math expression for Normal Curve
1
 ( x  m ) 2 / 2s 2
y
e
s 2



Where x is the x axis position.
s is the population standard deviation
m is the population average
Express the x position in terms of mean and
standard deviation and it is denoted - z
z
xm
s
xx

s
How much area falls outside + 3s ?





1- area(left)-area(right)
Area when z=3 is 0.449865
So 1-0.49864 - 0.49865 = 0.0027
Or in our example only 0.3% light bulbs last
more or less than x + 3s
For our mean and standard deviations this that
99.7% of these bulbs would last from 563 hours
to 1127 hours
How much area between ranges



m + 1s
m + 2s
m + 3s
68.3%
95.5%
99.7%
Applications that we will use.


If we were to change our light bulb
manufacturing process in some way then how
could be tell if it improved the bulbs
manufactured.
Perhaps the goal is shorten life, you can sell
more that way!
How do we tell if there a difference?


Since we know there is a spread of bulb
lifetimes then we can expect that we would
need to do more that just check one bulb.
We would repeat the entire experiment.
What statistical tools are need for this?
Compare two means
1.2
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
2.5
Compare two means (2)
1.2
1
0.8
0.6
0.4
0.2
0
0
1
2
3
We have a problem.



We often do not know the population mean
and standard deviation. m and s
This requires that we do many analyses.
We work with the sample mean xbar and the
sample deviation s
We use the Student’s t Statistic

Work W. S. Gosset published under the
assumed name of Student in 1908 in
Biometrica

For a biography of this person

http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Gosset.html
The Confidence Interval
This allows us to express the results in a quantitative
way
ts
m x
n
Tools for the first lab.



Case 1 - Is my data consistent with the
known or required analysis value.
Case 2 – Are two sets of data the same
Case 3 – Are two methods equivalent, limited
sample.
Case 1
tcalculated 

xm
s
n
If tcalculated is larger than the table value then
the value is not in bounds
Case 2
How do we determine if these are
different.
tcalc 
x1  x2
s pooled
n1n2
n1 + n2
Spooled
 (x  x ) +  (x
2
s pooled 
i
1
n1 + n2  2
j
 x2 )
2
Don’t worry about 4-8a and 4-9a


For when the standard deviations are not
equal.
F test is used to sort this out.
Case 3

Calculate average
deviation (keep signs)
d 
d
 i
n
Calculate t
tcalc
d

sd
n
Calculate sd
sd 

 (d
Then compare t
i
d)
n 1
calc
to t
table
2
Q Test

Arrange data in numerical order
Calculate Qcalc
Qcalc = gap/range

If Qcalc > QTable then reject value.

Used for small data sets.

