File - freesixsigmasite.com

Download Report

Transcript File - freesixsigmasite.com

Basic Statistics
Six Sigma Foundations
Continuous Improvement Training
Six Sigma Simplicity
Key Learning Points

Simple Statistics can:


Increase your Understanding of Process
Behavior
Helps Identify Improvement
Opportunities for 6S
Statistics

Common statistics:








Miles per gallon (liter); mpg (mpl)
Median home prices
Consumer price index
Inflation rate
Stock market average
Airline on-time arrival rate
Statistics are computed using data.
Statistics summarize the data and help us
to predict future performance.
Basic Statistics





Serve as a means to analyze data collected
in the Measure phase.
Allow us to numerically describe the data
that characterizes our process’ Xs and Ys.
Use past process and performance data to
make inferences about the future.
Serve as a foundation for advanced
statistical problem-solving methodologies.
Are a concept that creates a universal
language based on numerical facts rather
than intuition.
Data Visualization

Before any statistical tools are applied, visually
display and look at your data.
A histogram allows us to look at how the data
is distributed across our Y scale of measure.
Number of Wins for National Football League Teams (1998)
5
Number of Teams

Five teams won
eight games
4
3
2
1
0
0
5
10
Number of Games Won
15
Source: AOLSports
Building a Histogram
The following data came from our bicycle test facility: stopping
distances required to bring a 150 lb weight to a complete stop
with the rear brake applied from a 10 mph cruising speed.
Trial (sample #) 1
Stop Distance
(Feet)
14
2
3
6 13
4
5
6
7
7 10 10 11
8
9 10 11 12 13 14 15
9 11
9 11
9 10 10 10
Y-Axis
X-Axis
6
7
8
9
10
11
12
13
14
Feet
Measures of Central Tendency
In addition to counting occurrences and graphing the results, we
can describe processes in terms of central tendency and
dispersion.

Measures of Central Tendency
Mean (m, Xbar)—The arithmetic average of a set
of values



Median (M)—The number that reflects the middle
of a set of values




Uses the quantitative value of each data point
Is strongly influenced by extreme values
Is the 50th percentile
Is identified as the middle number after all the values are
sorted from high to low
Is not affected by extreme values
Mode—The most frequently occurring
value in a data set
Central Tendency
Exercise

Determine the mean, median, and mode for the bicycle
stopping distances used to create the histograms.
Mean
=
________
Median =
________
Mode
=
________
Trial
1
Stop Distance
(Feet)
14
2
3
6 13
4
5
6
7
7 10 10 11
8
9 10 11 12 13 14 15
9 11
9 11
9 10 10 10
Mean, Median, Mode
Mode
Median
80
Mean
Frequency
120
40
Median
100
Mean
50
0
0
60
80
100 120
Positive Skew
0
Mode
Median
Mean
60
Frequency
Frequency
Mode
40
20
0
30
50
70
90
Normal
110
20
40
60
Negative Skew
80
Measures of Dispersion
Range (R)—The difference between the highest and lowest
R  xmax  xmin
Sample Variance (s2)—The average squared distance of
each point from the average (Xbar)
n
2
x

x


2

2

2
 x - x    x - x   ...   x  x 
i




 n

s2  i  1
 1   2 
n 1
n 1
 
Sample Standard Deviation(s)—The square root of the
variance
n
2
x

x

i
2
i

1
s s =
n 1
 
Example of Measures of Dispersion
Number of Wins for National Football League Teams (1998)
Source: AOLSports
Xbar = 8
Frequency
5
4
3
2
1
0
0
5
10
Range = 12
s2 = 11.72
s = 3.42
15
Dispersion Exercise
Find measures
of dispersion for
the stopping
distance
data.
Fill in the table
at the right.
Range (R) =
Variance (s2) =
Std Dev (s) =
Population vs. Sample
(Certainty vs. Uncertainty)
A sample is just a subset of all possible values.
Population
Sample
Since the sample does not contain all the possible
values, there is some uncertainty about the population.
Hence any statistics, such as mean and standard
deviation, are just estimates of the true population
parameters.
Symbols
Sample
Population
N
n
Mean
(n = # of samples)
x  x 
n
Standard
Deviation s =
(little “s”)
 xi
m  i 1
N
 xi
x  i 1n
i 1
i
n 1
2
x

m


N
2
=
i 1
i
N
The Normal Curve

In 80 to 90% of
problems worked,
data will follow a
normal bell curve or
can be transformed to
look like a normal
curve.
 This curve is
described by the Xbar
and s “statistic.”
 The area under this
curve is 1 or 100%.
For the normal curve,
mean = median = mode.
X
s
Normal Bell Curve Properties
X1sd

Histograms (bar charts) are developed from samples.

Sample statistics (Xbar and s) are calculated from representatives
of the population.

From the histogram and sample statistics, we form a curve that
represents the population from which these samples were drawn.
X
68.26% of the data falls
within 1 standard deviation
from the mean
3sd
X
6sd
99.73% of the data falls within
 3 standard deviations from
the mean
99.9999998% of the data falls
within  6 standard deviations
from the mean
Other Data Distributions
15
Frequency
Frequency
Normal
10
5
0
95
105
115
0
100
200
300
20
Uniform
Frequency
Frequency
10
0
85
10
Log Normal
20
5
Exponential
10
0
0
80
90
100
110
120
0
100 200 300 400 500
Normal Curve Exercise



Here is a histogram of the bike stopping
distance data. (Xbar = 10 , s = 2)
Does the histogram appear normal?
Draw vertical lines at  1sd,  2sd,  4sd
Discuss
Frequency

5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Basic Statistics
Six Sigma Foundations
Continuous Improvement Training