Fundamentals of Statistics

Download Report

Transcript Fundamentals of Statistics

Quality Control
Chapter 4- Fundamentals
of Statistics
PowerPoint presentation to accompany
Besterfield
Quality Control, 8e
PowerPoints created by Rosida Coowar
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Outline
 Introduction
 Frequency Distribution
 Measures of Central Tendency
 Measures of Dispersion
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Outline-Continued
 Other Measures
 Concept of a Population and Sample
 The Normal Curve
 Tests for Normality
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Introduction
Definition of Statistics:
1.
A collection of quantitative data pertaining to
a subject or group. Examples are blood
pressure statistics etc.
2.
The science that deals with the collection,
tabulation, analysis, interpretation, and
presentation of quantitative data
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Introduction
Two phases of statistics:
 Descriptive Statistics:
Describes the characteristics of a product or
process using information collected on it.
 Inferential Statistics (Inductive):
Draws conclusions on unknown process
parameters based on information contained
in a sample.
Uses probability
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Collection of Data
Types of Data:
Attribute:
Discrete data. Data values can only be
integers. Counted data or attribute data.
Examples include:

How many of the products are defective?

How often are the machines repaired?

How many people are absent each day?
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Collection of Data – Cont’d.
Types of Data:
Attribute:
Discrete data. Data values can only be
integers. Counted data or attribute data.
Examples include:

How many days did it rain last month?

What kind of performance was achieved?

Number of defects, defectives
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Collection of Data
Types of Data:
Variable:
Continuous data. Data values can be any
real number. Measured data.
Examples include:

How long is each item?

How long did it take to complete the task?

What is the weight of the product?

Length, volume, time
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Precision and Accuracy
Precision
The precision of a measurement is determined
by how reproducible that measurement value is.
For example if a sample is weighed by a student
to be 42.58 g, and then measured by another
student five different times with the resulting
data: 42.09 g, 42.15 g, 42.1 g, 42.16 g, 42.12 g
Then the original measurement is not very
precise since it cannot be reproduced.
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Precision and Accuracy
Accuracy
 The accuracy of a measurement is determined by
how close a measured value is to its “true” value.
 For example, if a sample is known to weigh 3.182
g, then weighed five different times by a student
with the resulting data: 3.200 g, 3.180 g, 3.152 g,
3.168 g, 3.189 g
 The most accurate measurement would be 3.180 g,
because it is closest to the true “weight” of the
sample.
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Precision and Accuracy
Figure 4-1 Difference between accuracy and precision
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Describing Data
 Frequency Distribution
 Measures of Central Tendency
 Measures of Dispersion
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Frequency Distribution
 Ungrouped Data
 Grouped Data
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Frequency Distribution
2-7There are three types of frequency distributions
 Categorical frequency distributions
 Ungrouped frequency distributions
 Grouped frequency distributions
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Categorical
2-7Categorical frequency distributions
 Can be used for data that can be placed in
specific categories, such as nominal- or
ordinal-level data.
 Examples - political affiliation, religious
affiliation, blood type etc.
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Categorical
2-8
Example :Blood Type Frequency
Distribution
C lass
Frequency
Percent
A
5
20
B
7
28
O
9
36
AB
4
16
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Ungrouped
2-9Ungrouped frequency distributions
 Ungrouped frequency distributions - can be
used for data that can be enumerated and
when the range of values in the data set is not
large.
 Examples - number of miles your instructors
have to travel from home to campus, number
of girls in a 4-child family etc.
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Ungrouped
2-10
Example :Number of Miles Traveled
Class
Frequency
5
24
10
16
15
10
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Grouped
2-11 Grouped frequency distributions
 Can be used when the range of values in the
data set is very large. The data must be
grouped into classes that are more than one
unit in width.
 Examples - the life of boat batteries in hours.
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Grouped
2-12 Example: Lifetimes of Boat Batteries
Class
limits
Class
Frequency Cumulative
Boundaries
frequency
24 - 37
23.5 - 37.5
4
4
38 - 51
37.5 - 51.5
14
18
52 - 65
51.5 - 65.5
7
25
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Frequency Distributions
Number non
conforming
Frequency
Relative
Frequency
Cumulative
Frequency
Relative
Frequency
0
15
0.29
15
0.29
1
20
0.38
35
0.67
2
8
0.15
43
0.83
3
5
0.10
48
0.92
4
3
0.06
51
0.98
5
1
0.02
52
1.00
Table 4-3 Different Frequency Distributions of Data Given in Table 4-1
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Frequency Histogram
Frequency Histogram
25
Frequency
20
15
10
5
0
0
1
2
3
4
5
Number Nonconforming
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Relative Frequency Histogram
Relative Frequency Histogram
0.45
Relative Frequency
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0
1
2
3
4
5
Number Nonconforming
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Cumulative Frequency Histogram
Cumulative Frequency Histogram
Cumulative Frequency
60
50
40
30
20
10
0
0
1
2
3
4
5
Number Nonconforming
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
The Histogram
The histogram is the most important graphical tool
for exploring the shape of data distributions.
Check:
http://quarknet.fnal.gov/toolkits/ati/histograms.html
for the construction ,analysis and understanding of
histograms
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Constructing a Histogram
The Fast Way
Step 1: Find range of distribution, largest smallest values
Step 2: Choose number of classes, 5 to 20
Step 3: Determine width of classes, one
decimal place more than the data, class width =
range/number of classes # classes  n
Step 4: Determine class boundaries
Step 5: Draw frequency histogram
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Constructing a Histogram
Number of groups or cells
 If no. of observations < 100 – 5 to 9 cells
 Between 100-500 – 8 to 17 cells
 Greater than 500 – 15 to 20 cells
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Constructing a Histogram
For a more accurate way of drawing a
histogram see the section on grouped data
in your textbook
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Other Types of
Frequency Distribution Graphs
 Bar Graph
 Polygon of Data
 Cumulative Frequency Distribution or Ogive
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Bar Graph and Polygon of Data
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Cumulative Frequency
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Characteristics of Frequency
Distribution Graphs
Figure 4-6 Characteristics of frequency distributions
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Analysis of Histograms
Figure 4-7 Differences due to location, spread, and shape
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Analysis of Histograms
Figure 4-8 Histogram of Wash Concentration
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Measures of Central Tendency
The three measures in common use are the:

Average

Median

Mode
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Average
There are three different techniques available
for calculating the average three measures in
common use are the:

Ungrouped data

Grouped data

Weighted average
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Average-Ungrouped Data
n
Xi
X 
i 1 n
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Average-Grouped Data
h
fi X i
X 
i 1 n
f1 X 1  f 2 X 2 ...  f h X h .

f1  f 2 ...  f h
h = number of cells
Xi=midpoint
Besterfield: Quality Control, 8th ed..
fi=frequency
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Average-Weighted Average
Used when a number of averages are
combined with different frequencies
w
X
i1 i i
n
Xw 
n
w
i 1
Besterfield: Quality Control, 8th ed..
i
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Median-Grouped Data
M d  Lm
n

cf
m
2

fm




i


Lm=lower boundary of the cell with the median
N=total number of observations
Cfm=cumulative frequency of all cells below m
Fm=frequency of median cell
i=cell interval
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Example Problem
Boundaries
Midpoint
Frequency
Computation
23.6-26.5
25.0
4
100
26.6-29.5
28.0
36
1008
29.6-32.5
31.0
51
1581
32.6-35.5
34.0
63
2142
35.6-38.5
37.0
58
2146
38.6-41.5
40.0
52
2080
41.6-44.5
43.0
34
1462
44.6-47.5
46.0
16
736
47.6-50.5
49.0
6
294
320
11549
Total
Table 4-7 Frequency Distribution of the Life of 320 tires in 1000 km
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Median-Grouped Data
M d  Lm
n
 2  cf m

fm




i


Using data from Table 4-7
 320


154
 2

Md  35.6  
3  35.9

58




Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Mode
The Mode is the value that occurs with the
greatest frequency.
It is possible to have no modes in a series or
numbers or to have more than one mode.
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Relationship Among the
Measures of Central Tendency
Figure 4-9 Relationship among average, median and mode
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Measures of Dispersion
 Range
 Standard Deviation
 Variance
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Measures of Dispersion-Range
The range is the simplest and easiest to
calculate of the measures of dispersion.
Range = R = Xh - Xl
 Largest value - Smallest value in data
set
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Measures of Dispersion-Standard
Deviation
Sample Standard Deviation:
S

Besterfield: Quality Control, 8th ed..
i 1
( Xi  X )
2
n 1
2


Xi  / n
i 1 Xi   
i 1

n 1
n
S
n
n
2
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Standard Deviation
Ungrouped Technique
n i 1 Xi  (i 1 Xi )
n
S
Besterfield: Quality Control, 8th ed..
2
n
2
n(n  1)
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Standard Deviation
Grouped Technique
s
h
n i 1 ( f i X )  (  f i X i )
h
Besterfield: Quality Control, 8th ed..
2
i
2
i 1
n(n  1)
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Relationship Between the
Measures of Dispersion
 As n increases, accuracy of R decreases
 Use R when there is small amount of data or data
is too scattered
 If n> 10 use standard deviation
 A smaller standard deviation means better quality
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Relationship Between the
Measures of Dispersion
Figure 4-10 Comparison of two distributions with equal average and range
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Other Measures
There are three other measures that are
frequently used to analyze a collection of data:
 Skewness
 Kurtosis
 Coefficient
Besterfield: Quality Control, 8th ed..
of Variation
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Skewness
Skewness is the lack of symmetry of the data.
For grouped data:
a3


Besterfield: Quality Control, 8th ed..
h
f
(
X

X
)
/
n
i
i
i 1
3
s
3
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Skewness
Figure 4-11 Left (negative) and right (positive) skewness distributions
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Kurtosis
Kurtosis provides information regrading the shape
of the population distribution (the peakedness or
heaviness of the tails of a distribution).
For grouped data:
a4


Besterfield: Quality Control, 8th ed..
h
f
(
X

X
)
/
n
i
i
i 1
4
s
4
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Kurtosis
Figure 4-11 Leptokurtic and Platykurtic distributions
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Coefficient of Variation
Correlation variation (CV) is a measure of how
much variation exists in relation to the mean.
s (100%)
CV 
X
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Population and Sample

Population
 Set of all items that possess a
characteristic of interest

Sample
 Subset of a population
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Parameter and Statistic
Parameter is a characteristic of a population, i.o.w. it
describes a population
 Example: average weight of the population, e.g.
50,000 cans made in a month.
Statistic is a characteristic of a sample, used to
make inferences on the population parameters that
are typically unknown, called an estimator
 Example: average weight of a sample of 500 cans
from that month’s output, an estimate of the average
weight of the 50,000 cans.
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
The Normal Curve
Characteristics of the normal curve:
 It is symmetrical -- Half the cases are to one
side of the center; the other half is on the
other side.
 The distribution is single peaked, not bimodal
or multi-modal
 Also known as the Gaussian distribution
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
The Normal Curve
Characteristics:
Most of the cases will fall in the center portion of
the curve and as values of the variable become
more extreme they become less frequent, with
"outliers" at the "tail" of the distribution few in
number. It is one of many frequency distributions.
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Standard Normal Distribution
The standard normal distribution is a normal
distribution with a mean of 0 and a standard deviation
of 1. Normal distributions can be transformed to
standard normal distributions by the formula:
Z
Besterfield: Quality Control, 8th ed..
X i 

© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Relationship between the Mean
and Standard Deviation
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Mean and Standard Deviation
Same mean but different standard deviation
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Mean and Standard Deviation
Same mean but different standard deviation
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Normal Distribution
IF THE DISTRIBUTION IS NORMAL
Then the mean is the best measure of
central tendency
Most scores “bunched up” in middle
Extreme scores are less frequent,
therefore less probable
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Normal Distribution
Percent of items included between certain values of the std. deviation
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Tests for Normality

Histogram

Skewness

Kurtosis
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Tests for Normality
Histogram:
Shape
 Symmetrical
The larger the sampler size, the better the
judgment of normality. A minimum sample size of
50 is recommended
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Tests for Normality
Skewness (a3) and Kurtosis (a4)”

Skewed to the left or to the right (a3=0 for a
normal distribution)

The data are peaked as the normal
distribution (a4=3 for a normal distribution)

The larger the sample size, the better the
judgment of normality (sample size of 100 is
recommended)
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Tests for Normality
Probability Plots

Order the data from the smallest to the largest

Rank the observations (starting from 1 for the
lowest observation)

Calculate the plotting position
100(i  0.5)
PP 
n
Where i = rank PP=plotting position
Besterfield: Quality Control, 8th ed..
n=sample size
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Probability Plots
Procedure:

Order the data

Rank the observations

Calculate the plotting position
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Probability Plots
Procedure cont’d:

Label the data scale

Plot the points

Attempt to fit by eye a “best line”

Determine normality
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Probability Plots
Procedure cont’d:

Order the data

Rank the observations

Calculate the plotting position

Label the data scale

Plot the points

Attempt to fit by eye a “best line”

Determine normality
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Chi-Square Goodness of Fit Test
Chi-Square Test
(Oi  Ei )
 
Ei
i 1
k
2
2
Where
 2  Chi-squared
Oi  Observed value in a cell
E i  Expected value for a cell
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Scatter Diagram
The simplest way to determine if a cause
and-effect relationship exists between two
variables
Figure 4-19 Scatter Diagram
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Scatter Diagram
 Supplies the data to confirm a hypothesis that
two variables are related
 Provides both a visual and statistical means
to test the strength of a relationship
 Provides a good follow-up to cause and effect
diagrams
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved
Straight Line Fit
xy  [(  x )(  y ) / n

m
 x  [( x ) / n]
a   y / n  m(  x / n )
2
2
y  a  mx
Where m=slope of the line and a is the intercept on the y axis
Besterfield: Quality Control, 8th ed..
© 2009 Pearson Education, Upper Saddle River, NJ 07458.
All rights reserved