Analysis of Process Capability

Download Report

Transcript Analysis of Process Capability

SMU
EMIS 7364
NTU
TO-570-N
Statistical Quality Control
Dr. Jerrell T. Stracener,
SAE Fellow
Analysis of Process Capability
Statistical Analysis
Updated: 1/28/04
1
The Situation
In many situations, our knowledge is limited to the
information that can be obtained from data that has
been obtained or that will be obtained
2
The Problem
The challenge is to obtain the maximum information
from the data and to arrive at the most accurate
conclusions
3
Nature of Data
Most data are characterized by variation, as opposed
to deterministic, due to variation in
•
•
•
•
•
•
Processes and materials
Product/Manufacturing
Inspection & Measurement
Operation
Environment
etc
4
Need
Methods and techniques are needed for analysis of
data that account for
• Variation in the data
• Uncertainty in conclusion
5
Statistics
• Statistics is the science of analyzing data and
drawing conclusions
• Statistical methods and techniques that provide
tools for:
- experimental design
- analysis of data
- making inferences
6
7
8
9
10
11
Example
The number of defects per inspected PC-X based on a
random sample of 15 from a days’ production is:
1, 3, 1, 0, 2, 0, 0, 1, 1, 1, 0, 1, 2, 1, 1
(i.) Analyze these data and present your results.
(ii.) Estimate the probability that a randomly selected
PC will have at least 3 defects.
12
Example
Twenty-five customers selected at random were asked to rate
the overall satisfaction, a measure of quality, with the PC-X.
Five factors were ranked by each selected customer. Each
factor was assigned a rank between 1 and 10, with 10
indicating the highest level of customer satisfaction. The
ratings were averaged over the five factors for each
customer. The results are:
7.7
5.5
9.3
6.5
7.5
5.2
7.7
8.5
6.0
8.8
6.2
8.6
7.1
8.0
7.9
7.8
5.9
9.6
8.3
6.7
7.6
6.9
7.3
9.1
7.8
(i.) Analyze the survey data and present your results.
(ii.) Estimate the proportion of the customer population
whose average satisfaction rating is at least 9.
13
Example Solution
a. X = number of defects per…
Since X represents a count, it is a discrete random
variable.
(i)
Sample mean = x = 15/15 = 1
Sample mode = 1
Sample median = x0.5 = 1
Ratio of mean to median = 1/1 = 1
Sample range = Xmax - Xmin = 3 - 0 = 3
Sample variance = s2 = 0.714286
Sample standard deviation = 0.845154
14
Example Solution
Histogram
Frequency
10
8
6
4
2
0
0
1
2
3
4
5
x
15
Example Solution
The sample could be from a Poisson distribution
with probability mass function
μ x e μ
px  
x!
for x = 0, 1, 2, ...
Here we estimate  as
^
μ  x 1
so that
e 1
px  
x!
^
for x = 0, 1, 2, ...
16
Example Solution
Then we compare px  with the discrete relative
frequency distribution as follows
^
x
frequency
f/n
p x 
^
0
4
0.26667
0.36788
1
8
0.53333
0.36788
2
2
0.13333
0.18394
3
1
0.06667
0.06131
4
0
0
0.01533
17
Example Solution
(ii) We can estimate P(X  3) as follows
1
pX  3 
15
 0.067
^
or using the Poisson distribution
pX  3  1  pX  2
^
^
 1   p0 p1 p2
^
^
^
 0.0803
18
Example Solution
b.
(i)Sample data analysis indicates that the sample
may be from a normal distribution, N(, ).
Estimates of  and  are
^
μx
 7.5
and
n 1
σs
n
^
 1.1562
19
Example Solution
Data analysis:
X = 7.5
X0.5 = 7.7
X/X0.5 = 0.974
R = 4.4
s2 = 1.3925
x = 1.18004
20
Example Solution
Histogram
12
Frequency
10
8
6
4
2
0
5
6
7
8
9
x
21
Example Solution
(ii) By using the Normal distribution
9  7.5 

Px  9   P Z 

1.1562 

^
 PZ  1.2973
 1  0.901475  0.098525
Or, by using the Binomial distribution
# values  9 3
Px  9 

 0.12
n
25
^
22
• Basic Concepts
• Analysis of Location, or Central Tendency
• Analysis of Variability
• Analysis of Shape
23
Population vs. Sample
Population
the total of all possible values (measurement,
counts, etc.) of a particular characteristic for a
specific group of objects.
Sample
a part of a population selected according to
some rule or plan.
Why sample?
- Population does not exist
- Sampling and testing is destructive
24
Sampling
Characteristics that distinguish one type of sample
from another:
• the manner in which the sample was obtained
• the purpose for which the sample was obtained
25
Types of Samples
Simple Random Sample
The sample X1, X2, ... ,Xn is a random sample if
X1, X2, ... , Xn are independent identically
distributed random variables.
Remark: Each value in the population has an
equal and independent chance of being included
in the sample.
26
Analysis of Data
• Data represents the entire population
Statistical analysis is primarily descriptive.
• Data represents sample from population
Statistical analysis
- describes the sample
- provides information about the population
27
Analysis of Location or Central Tendency
• Sample (Arithmetic) Mean
• Sample Midrange
• Sample Mode
• Sample Median
• Sample Percentiles
28
Sample Mean
1 n
• Formula: x   x i
n i 1
• Remarks:
Most frequently used statistic
Easy to understand
May be misleading due to extreme values
29
Sample Mode
• Definition:
Most frequently occurring value in the sample
• Remarks:
A sample may have more than one mode
The mode may not be a central value
Not well understood, nor frequently used
30
Sample Median
xk
Formula:
, if n is odd & K = (n+1)/2
x 0.5  x k  x k 1
, if n is even & K = n/2
2
where the sample values X1, X2, ... , Xn
are arranged in numerical order
• Remarks:
Not well understood, nor accepted
All sample data does not appear to be utilized
Not affected by extreme values
31
Analysis of Variability
• Sample Range
• Sample Variance
• Sample Standard Deviation
• Sample of Coefficient of Variation
32
Sample Range
• Formula: R = Xmax - Xmin
where Xmax is the largest value in the sample
and Xmin is the smallest sample value
• Remarks:
Easy to determine
Easily understood
Determined by extreme values
Does not use all sample data
33
Sample Variance & Standard Deviation
• Sample Variance
n
1
2
s 

n  1 i 1


2 
n  x i     x i 
2
i 1
i 1




xi  x 
n n  1

n
n
2

• Sample Standard Deviation
s = (sample variance)1/2
• Remarks
Most frequently used measure of variability
Not well understood
34
Sample Coefficient of Variation
• Sample Variance
CVs 
s
x
• Remarks
Relative measure of variation
Used for comparing the variation in two samples of
data that are measured in two different units
35
Analysis of Shape
• Skewness
• Kurtosis
36
Estimate of Skewness
x
xr 
x0.5
For a unimodal distribution, xr is an indicator of
distribution shape
<< 1 , indicates skewed to the left
xr
≈1
, indicates symmetric
>> 1 , indicates skewed to the right
37
Comparison of Distribution Skewness
•Normal
1  0
• Exponential
1  2
38
Estimation of Skewness
• Estimate of skewness of a distribution from a
random sample
^
1  b1  m3 /( m 2 )3 / 2
where
n

1
m2   xi  x
n i 1
and

2
n

1
m3   x i  x
n i 1

3
1 n
x   xi
n i 1
39
Estimation of Kurtosis
• Estimate of kurtosis of a distribution (2) from a
random sample
^
 2  b2  m4 /(m2 )
2
where
n

1
m2   xi  x
n i 1
and

2
n

1
m4   xi  x
n i 1

4
1 n
x   xi
n i 1
40
Comparison of Kurtosis
f(x)
1.4
2= 3 (normal distribution)
1.2
2= 1.8 (uniform distribution)
1.0
0.8
0.6
0.4
0.2
0
-0.5
0
0.5
1.0
1.5
41
Presentation of Data
42
• Time Series Graph or Run Chart
• Stem-and-Leaf Plot
• Digidot Plot
• Box Plot
• Frequency Distribution
• Histogram and Relative Frequency
43
Time Series Graph or Run Chart
• A plot of the data set x1, x2, …, xn in the order
in which the data were obtained
• Used to detect trends or patterns in the data
over time
44
Stem-and-Leaf Plots
• A quick way to obtain an informative visual
representation of the set of data x1, x2, …, xn
for which each xi consists of at least two digits
• Steps for constructing a stem-and-leaf display
(1) Select one or more leading digits for the stem
values. The trailing digits become the leaves.
(2) List possible stem values in a vertical column.
(3) Record the leaf for every observation beside the
corresponding stem value.
(4) Indicate the units for stems and leaves
someplace in the display
• The stem and leaf display does not take the time
order of the observed data into account
45
Stem-and-Leaf Plot - Example
Here are test scores for 25 students:
69, 55, 80, 95, 94, 98, 51, 70, 93, 57, 62, 52, 52
58, 61, 51, 64, 67, 78, 68, 69, 68, 96, 73, 71
The first step is to place the numbers in order from
least to greatest:
51, 51, 52, 52, 55, 57, 58, 61, 62, 64, 67, 68, 68,
69, 69, 70, 71, 73, 78, 80, 93, 94, 95, 96, 98
46
Stem-and-Leaf Plot - Example
Now create the graph:
Test Scores
5 1122578
6 12478899
7 0138
8 0
9 34568
The numbers on the left side of the vertical line are
the stems. The numbers on the right side are the
leaves. In this graph, the stems are the tens digits
and the leaves are the unit digits. In this case, 9|3
represents a score of 93.
47
Digidot Plot
A combination of the time series graph with the
stem and leaf display
48
Box Plot
• A pictorial summary used to describe the
most prominent statistical features of the data
set, x1, x2, …, xn, including its:
- Center or location
- Spread or variability
- Extent and nature of any deviation from
symmetry
- Identification of ‘outliers’
49
Box Plot
• Shows only certain statistics rather than all the
data, namely
- median
- quartiles
- smallest and greatest values in the
distribution
• Immediate visuals of a box plot are the center,
the spread, and the overall range of distribution
50
Box Plot
Given the following random sample of size 25:
38, 10, 60, 90, 88, 96, 1, 41, 86, 14, 25, 5, 16,
22, 29, 34, 55, 36, 37, 36, 91, 47, 43, 30, 98
Arranged in order from least to greatest:
1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36, 37,
38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98
51
Box Plot
• First, find the median, the value exactly in the
middle of an ordered set of numbers.
The median is 37
• Next, we consider only the values to the left of
the median:
1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36
• We now find the median of this set of numbers.
The median for this group is (22 + 25)/2 = 23.5,
which is the lower quartile.
52
Box Plot
• Now consider the values to the right of the
median.
38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98
The median for this set is (60 + 86)/2 = 73, which
is the upper quartile.
• We are now ready to find the interquartile range
(IQR), which is the difference between the upper
and lower quartiles, 73 - 23.5 = 49.5
49.5 is the interquartile range
53
Box Plot
The
The
The
The
lower quartile 23.5
median is 37
upper quartile 73
interquartile range is 49.5
lower
extreme
0
lower
quartile
median
upper
quartile
upper
extreme
10 20 30 40 50 60 70 80 90 100
54
Histogram
A graph of the observed frequencies in the data
set, x1, x2, …, xn versus data magnitude to
visually indicate its statistical properties, including
- shape
- location or central tendency
- scatter or variability
55
Guidelines for Constructing Histograms
• If the data x1, x2, …, xn are from a discrete
random variable with possible values y1, y2, …, yn
count the number of occurrences of each value
of y and associate the frequency fi with yi,
for i = 1, …, k
56
Guidelines for Constructing Histograms
• If the data x1, x2, …, xn are from a continuous
random variable
- select the number of intervals or cells, r,
to be a number between 3 and 20, as an
initial value use r = (n)1/2, where n is the
number of observations
- establish r intervals of equal width, starting
just below the smallest value of x
- count the number of values of x within
each interval to obtain the frequency
associated with each interval
- construct graph by plotting (fi, i) for
i = 1, 2, …, k
57
Statistical Process Control- Histograms
Possible answers for a Cliff-like histogram
• Hiding data that should be outside the specification
• Supplier is screening the product before shipment
• Lower specification is a physical limit like zero
thickness, but this is not normally the case
lower spec
upper spec
58
Statistical Process Control- Histograms
Possible answers for a Bimodal histogram
• Two primary sources of process variation
• The process is stable, but it has experienced
a large shift during the time the data were collected
lower spec
upper spec
59
Statistical Process Control - Histograms
Possible answers for a Comb-like histogram
• Insufficient data collected
• Too many classes displayed
• Process is unstable
• Process is stable but is multimodal
lower spec
upper spec
60
Statistical Process Control - Histograms
Possible answers for a Skewed histogram
• May be the natural result of the process
• For a machined part, the equipment may be losing
tolerance or tools may be wearing out
• The process is shifting slowly to the side with the
long tail
lower spec
upper spec
61
Statistical Process Control - Histograms
By including specification limits on a histogram, the
amount of data that falls outside of the specification
limits can be easily seen
specification
frequency
lower spec
upper spec
62
Probability Plotting
• Data are plotted on special graph paper
designed for a particular distribution
- Normal
- Weibull
- Lognormal
- Exponential
• If the assumed model is adequate, the plotted
points will tend to fall in a straight line
• If the model is inadequate, the plot will not
be linear and the type & extent of departures
can be seen
• Once a model appears to fit the data
reasonably will, percentiles and parameters can63
be estimated from the plot
Probability Plotting Procedure
• Step 1: Obtain special graph paper, known as
probability paper, designed for the distribution under
examination. Weibull, Lognormal and Normal paper
are available at:
http://www.weibull.com/GPaper/index.htm
• Step 2: Rank the sample values from smallest
to largest in magnitude i.e., X1  X2  ..., Xn.
64
Probability Plotting General Procedure
• Step 3:
 i  0.3 
Plot the Xi’s on the paper versus F(x )  100 n  0.4  or
i  0.3
F( x ) 
n  0.4 , depending on whether the marked axis
on the paper refers to the % or the proportion
of observations. The axis of the graph paper on
which the Xi’s are plotted will be referred to as
the observational scale, and the axis for
 i  0.3 
F( x )  100 
as the cumulative scale.
 n  0.4 
^
i
^
i
^
i
• Step 4: If a straight line appears to fit the data,
draw a line on the graph, ‘by eye’.
• Step 5: Estimate the model parameters from
the graph.
65
Weibull Probability Plotting Paper
If T ~ Wβ, θ,
the cumulative probability distribution function is
F(t )  1  e
t
 
 

We now need to linearize this function into the
form y = ax +b:
66
Weibull Probability Plotting Paper
Then
  x 
ln 1  F(T)   ln e   


x
ln 1  F(T)    
 






x
ln  ln 1  F(T)    ln  
 
  1 
   ln  x    ln  
ln ln 
  1  F(T) 
which is the equation of a straight line of the form
y = ax +b,
67
Weibull Probability Plotting Paper

 1 
,
where y  ln  ln 
 1  F( t ) 

a
x  ln t 
and
b   ln  , i.e.,
68
Weibull Probability Plotting Paper
y  x   ln  ,
which is a linear equation with a slope of  and an
intercept of  ln   . Now the x- and y-axes of the
Weibull probability plotting paper can be
constructed. The x-axis is simply logarithmic,
since x = ln(T) and

 1 
,
y  ln  ln 
 1  F( t ) 

69
Weibull Probability Plotting Paper
cumulative
probability
(in %)
x
70
Probability Plotting - example
To illustrate the process let 10, 20, 30, 40, 50, and
80 be a random sample of size n = 6.
71
Probability Plotting - example
We need value estimates corresponding to each of
the sample values in order to plot the data on the
Weibull probability paper. These estimates are
accomplished with what are called median ranks.
72
Probability Plotting - example
Median ranks represent the 50% confidence level
(“best guess”) estimate for the true value of F(t),
based on the total sample size and the order
number (first, second, etc.) of the data.
73
Probability Plotting - example
There is an approximation that can be used to
estimate median ranks, called Benard’s
approximation. It has the form:
i  0.3
F̂x i   MR i 
n  0.4
where n is the sample size and i is the sample
order number. Tables of median ranks can be
found in may statistics and reliability texts.
74
Probability Plotting - example
Based on Benard’s approximation, we can now
^
calculate F(t)
for each observed value of X. These
are shown in the following table:
x
10
20
30
40
50
80
^
F(x)
10.9%
26.6%
42.2%
57.8%
73.4%
89.1%
For example, for x=20, F̂20  2  0.3 *100%
6  0.4
 26.6%
75
Probability Plotting- example
Now that we have y-coordinate values to go with
the x-coordinate sample values so we can plot the
x , F̂x̂  points on Weibull probability paper.


^
F(x)
(in %)
x
76
Probability Plotting- example
The line represents the estimated relationship
between x and F(x):
^
F(x)
(in %)
x
77
Probability Plotting - example
In this example, the points on Weibull probability
paper fall in a fairly linear fashion, indicating that
the Weibull distribution provides a good fit to the
data. If the points did not seem to follow a
straight line, we might want to consider using
another probability distribution to analyze the
data.
78
Probability Plotting - example
79
Probability Plotting - example
80
Probability Paper - Normal
81
Probability Paper - Lognormal
82
Probability Paper - Exponential
83
Probability Plotting Exercise
Given the following random sample of size n=8,
which probability distribution provides the best fit?
i
1
2
3
4
5
6
7
8
xi
79.4
88.1
91.1
98.7
104.2
105.1
106.5
112.0
84
40 Specimens
40 specimens are cut from a plate for tensile tests. The
tensile tests were made, resulting in Tensile Strength, x, as
follows:
i
1
2
3
4
5
6
7
8
9
10
x
48.5
54.7
47.8
56.9
54.8
57.9
44.9
53.0
54.7
46.7
i
11
12
13
14
15
16
17
18
19
20
x
55.0
55.7
49.9
54.8
49.7
58.9
52.7
57.8
46.8
49.2
i
21
22
23
24
25
26
27
28
29
30
x
53.1
49.1
55.6
46.2
52.0
56.6
52.9
52.2
54.1
42.3
i
31
32
33
34
35
36
37
38
39
40
x
54.6
49.9
44.5
52.9
54.4
60.2
50.2
57.4
54.8
61.2
Perform a statistical analysis of the tensile strength data.
85
4/7/2016
40 Specimens
Time Series plot:
65.0
60.0
55.0
50.0
45.0
40.0
35.0
30.0
0
5
10
15
20
25
30
35
40
By visual inspection of the scatter plot, there seems to be no
trend.
86
40 Specimens
Using the descriptive statistics function in excel, the following
were calculated:
Descriptive Statistics
Count
Sum
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
40
2104.82
52.62
0.70
53.03
4.45
19.83
-0.39
-0.34
18.84
42.35
61.18
87
40 Specimens
Using the histogram feature of excel the following data was
calculated:
Bin
40
45
50
55
60
More
Frequency
0
3
10
16
9
2
and the graph:
Histogram of Tensile Strengths
18
16
14
12
10
8
6
4
2
0
40
45
50
55
60
More
88
40 Specimens
Box Plot
The
The
The
The
lower quartile 49.45
median is 53.03
upper quartile 55.3
interquartile range is 5.86
lower
extreme
40
lower
quartile
45
50
upper
quartile
median
55
average
upper
extreme
60
65
89
40 Specimens
Normal Probability Plot
99.90%
99%
95%
90%
80%
70%
60%
50%
40%
30%
20%
10%
5%
1%
0.10%
40
45
50
55
60
65
90
40 Specimens
LogNormal Probability Plot
99.90%
99%
95%
90%
80%
70%
60%
50%
40%
30%
20%
10%
5%
1%
0.10%
10
100
91
40 Specimens
Weibull Probability Plot
99.90%
99%
95%
90%
80%
70%
60%
50%
40%
30%
20%
10%
5%
3%
2%
1%
0.50%
0.30%
0.20%
0.10%
41
44
48
52
56
61
92
40 Specimens
From looking at the Histogram and the Normal Probability
Plot, we see that the tensile strength can be estimated by
a normal distribution.
The tensile strength distribution can be estimated by
X ~ Nμ^  52.62, σ^  4.45
1
^
F(x)
0.8
0.6
0.4
^
f(x)
0.2
0
49
50
51
52
53
54
55
93