Statistical Analysis - Graphical Techniques
Download
Report
Transcript Statistical Analysis - Graphical Techniques
Systems Engineering Program
Department of Engineering Management, Information and Systems
EMIS 7370/5370 STAT 5340 :
PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS
Statistical Analysis - Graphical Techniques
Dr. Jerrell T. Stracener, SAE Fellow
Leadership in Engineering
1
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
•Time Series Graph or Run Chart
• Box Plot
• Histogram and Relative Frequency Histogram
• Frequency Distribution
• Probability Plotting
2
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Time Series Graph or Run Chart
• A plot of the data set x1, x2, …, xn in the order
in which the data were obtained
•Used to detect trends or patterns in the data
over time
3
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Box Plot
• A pictorial summary used to describe the
most prominent statistical features of the data
set, x1, x2, …, xn, including its:
- Center or location
- Spread or variability
- Extent and nature of any deviation from symmetry
- Identification of ‘outliers’
4
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Box Plot
• Shows only certain statistics rather than all the
data, namely
- median
- quartiles
- smallest and greatest values in the sample
• Immediate visuals of a box plot are the center,
the spread, and the overall range of the data
5
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Box Plot
Given the following random sample of size 25:
38, 10, 60, 90, 88, 96, 1, 41, 86, 14, 25, 5, 16,
22, 29, 34, 55, 36, 37, 36, 91, 47, 43, 30, 98
Arranged in order from least to greatest:
1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36, 37,
38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98
6
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Box Plot
•First, find the median, the value exactly in the
middle of an ordered set of numbers.
The median is 37
• Next, we consider only the values to the left of
the median:
1, 5, 10, 14, 16, 22, 25, 29, 30, 34, 36, 36
We now find the median of this set of numbers.
The median for this group is (22 + 25)/2 = 23.5,
which is the lower quartile.
7
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Box Plot
• Now consider the values to the right of the
median.
38, 41, 43, 47, 55, 60, 86, 88, 90, 91, 96, 98
The median for this set is (60 + 86)/2 = 73, which
is the upper quartile.
We are now ready to find the interquartile range
(IQR), which is the difference between the upper
and lower quartiles, 73 - 23.5 = 49.5
49.5 is the interquartile range
8
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Box Plot
The lower quartile 23.5
The median is 37
The upper quartile 73
The interquartile range is 49.5
The mean is 45.1
lower
extreme
0
lower
quartile
median
mean
upper
quartile
upper
extreme
10 20 30 40 50 60 70 80 90 100
9
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Histogram
A graph of the observed frequencies in the data
set, x1, x2, …, xn versus data magnitude to
visually indicate its statistical properties, including
Guidelines for Constructing Histograms – Discrete Data
- shape
- location or central tendency
- scatter or variability
10
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Guidelines for Constructing Histograms – Discrete Data
• If the data x1, x2, …, xn are from a discrete
random variable with possible values y1, y2, …, yk
count the number of occurrences of each value
of y and associate the frequency fi with yi,
for i = 1, …, k,
k
Note that
f
i 1
i
n
11
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Guidelines for Constructing Histograms – Discrete Data
• If the data x1, x2, …, xn are from a continuous
random variable
- select the number of intervals or cells, r,
to be a number between 3 and 20, as an
initial value use r = (n)1/2, where n is the
number of observations
- establish r intervals of equal width, starting
just below the smallest value of x
- count the number of values of x within
each interval to obtain the frequency
associated with each interval
- construct graph by plotting (fi, i) for
i = 1, 2, …, k
12
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Histogram and Relative Frequency Example
To illustrate the construction of a relative frequency distribution,
consider the following data which represent the lives of 40 car
batteries of a given type recorded to the nearest tenth of a year.
The batteries were guaranteed to last 3 years.
2.2
3.4
2.5
3.3
4.7
4.1
1.6
4.3
3.1
3.8
3.5
3.1
3.4
3.7
3.2
Car Battery Lives
4.5
3.2
3.3
3.8
3.6
2.9
4.4
3.2
2.6
3.9
3.7
3.1
3.3
4.1
3
3
4.7
3.9
1.9
4.2
2.6
3.7
3.1
3.4
3.5
13
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Histogram and Relative Frequency Example
For this example, using the guidelines for constructing a histogram,
the number of classes selected is 7 with a class width of 0.5. The
frequency and relative frequency distribution for the data are shown
in the following table.
Relative Frequency Distribution of
Battery Lives
Class
Class
Frequency Relative
interval
midpoint
f
frequency
1.5-1.9
1.7
2
0.050
2.0-2.4
2.2
1
0.025
2.5-2.9
2.7
4
0.100
3.0-3.4
3.2
15
0.375
3.5-3.9
3.7
10
0.250
4.0-4.4
4.2
5
0.125
4.5-4.9
4.7
3
0.075
Total
40
1.000
14
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Histogram and Relative Frequency
The following diagram is a relative frequency histogram of the battery
lives with an approximate estimate of the probability density function
superimposed.
Relative frequency histogram
0.400
Relative Frequency
0.350
0.300
0.250
0.200
0.150
0.100
0.050
0.000
1.7
2.2
2.7
3.2
3.7
4.2
4.7
Battery Lives (years)
15
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting
• Data are plotted on special graph paper
designed for a particular distribution
- Normal
- Weibull
- Lognormal
- Exponential
• If the assumed model is adequate, the plotted
points will tend to fall in a straight line
• If the model is inadequate, the plot will not
be linear and the type & extent of departures
can be seen
• Once a model appears to fit the data
reasonably will, percentiles and parameters can
be estimated from the plot
16
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting Procedure
• Step 1: Obtain special graph paper, known as
probability paper, designed for the distribution under
examination. Weibull, Lognormal and Normal paper
are available at:
http://www.weibull.com/GPaper/index.htm
• Step 2: Rank the sample values from smallest
to largest in magnitude i.e., X1 X2 ..., Xn.
17
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting General Procedure
• Step 3:
i 0.3
Plot the Xi’s on the paper versus F ( x ) 100 n 0.4 or
i 0.3
F( x )
, depending on whether the marked axis
n 0.4
on the paper refers to the % or the proportion
of observations. The axis of the graph paper on
which the Xi’s are plotted will be referred to as
the observational scale, and the axis for
i 0.3
F( x ) 100
as the cumulative scale.
n 0.4
i
^
i
^
i
• Step 4: If a straight line appears to fit the data,
draw a line on the graph, ‘by eye’.
• Step 5: Estimate the model parameters from
the graph.
18
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Weibull Probability Plotting Paper
If
T ~ Wβ, θ
the cumulative probability distribution function is
F(t ) 1 e
t
We now need to linearize this function into the form
y = ax +b
19
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Weibull Probability Plotting Paper
Then
ln 1 F(T) ln e
x
x
ln 1 F(T)
x
ln ln 1 F(T) ln
1
ln x ln
ln ln
1 F(T)
which is the equation of a straight line of the form
y = ax +b
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
20
Weibull Probability Plotting Paper
where
1
y ln ln
1 F( t )
a
and
x ln t
b ln , i.e.,
21
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Weibull Probability Plotting Paper
y x ln
which is a linear equation with a slope of b and an
intercept of ln . Now the x- and y-axes of the Weibull
probability plotting paper can be constructed. The x-axis
is simply logarithmic, since x = ln(T) and
1
y ln ln
1 F( t )
22
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Weibull Probability Plotting Paper
cumulative
probability
(in %)
x
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
23
Probability Plotting - Example
To illustrate the process let 10, 20, 30, 40, 50, and 80 be a
random sample of size n = 6.
24
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting - Example
We need value estimates corresponding to each of the
sample values in order to plot the data on the Weibull
probability paper. These estimates are accomplished with
what are called median ranks.
25
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting - Example
Median ranks represent the 50% confidence level (“best
guess”) estimate for the true value of F(t), based on the
total sample size and the order number (first, second,
etc.) of the data.
26
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting - Example
There is an approximation that can be used to estimate
median ranks, called Benard’s approximation. It has the
form:
i 0.3
F̂x i MR i
(100%)
n 0.4
where n is the sample size and i is the sample order
number. Tables of median ranks can be found in may
statistics and reliability texts.
27
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting - Example
Based on Benard’s approximation, we can now calculate
^
F(t) for each observed value of X. These are shown in the
following table:
i
xi
^
F(xi)
1
2
3
4
5
6
10
20
30
40
50
80
10.9%
26.6%
42.2%
57.8%
73.4%
89.1%
For example, for x2=20,
2 0.3
*100%
6 0.4
26.6%
F̂20
28
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Weibull Probability Plotting Paper
cumulative
probability
(in %)
x
29
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting - Example
Now that we have y-coordinate values to go with the xcoordinate sample values so we can plot the
x , F̂x̂ points on Weibull probability paper.
^
F(x)
(in %)
x
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
30
Probability Plotting - Example
The line represents the estimated relationship between x
and F(x):
^
F(x)
(in %)
x
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
31
Probability Plotting - Example
In this example, the points on Weibull probability paper fall
in a fairly linear fashion, indicating that the Weibull
distribution provides a good fit to the data. If the points
did not seem to follow a straight line, we might want to
consider using another probability distribution to analyze
the data.
32
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting - Example
33
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Plotting - Example
34
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Paper - Normal
35
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Paper - Lognormal
36
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Probability Paper - Exponential
37
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
Example - Probability Plotting
Given the following random sample of size n=8, which
probability distribution provides the best fit?
i
1
2
3
4
5
6
7
8
xi
79.40968
88.12093
91.06394
98.73094
104.1536
105.1019
106.5036
112.0354
38
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
40 Specimens
40 specimens are cut from a plate for tensile tests. The tensile tests
were made, resulting in Tensile Strength, x, as follows:
1
2
3
4
5
6
7
8
9
10
48.5
54.7
47.8
56.9
54.8
57.9
44.9
53.0
54.7
46.7
11
12
13
14
15
16
17
18
19
20
55.0
55.7
49.9
54.8
49.7
58.9
52.7
57.8
46.8
49.2
21
22
23
24
25
26
27
28
29
30
53.1
49.1
55.6
46.2
52.0
56.6
52.9
52.2
54.1
42.3
31
32
33
34
35
36
37
38
39
40
54.6
49.9
44.5
52.9
54.4
60.2
50.2
57.4
54.8
61.2
Perform a statistical analysis of the tensile strength data.
39
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
40 Specimens
Time Series plot:
65.0
60.0
55.0
50.0
45.0
40.0
35.0
30.0
0
5
10
15
20
25
30
35
40
By visual inspection of the scatter plot, there seems to be no trend.
40
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
40 Specimens
Using the descriptive statistics function in Excel, the following
were calculated:
Descriptive Statistics
Count
Minimum
Maximum
Range
Sum
Mean
Median
Sample Variance
Standard Deviation
Kurtosis
Skewness
40
42.35
61.18
18.84
2104.82
52.62
53.03
19.83
4.45
2.51
-0.34
41
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
40 Specimens
Using the histogram feature of excel the following data was calculated:
Bin
40
45
50
55
60
More
Frequency
0
3
10
16
9
2
and the graph:
Histogram of Tensile Strengths
18
16
From looking at the Histogram and the
Normal Probability Plot, we see that
the tensile strength can be estimated
by a normal distribution.
14
12
10
8
6
4
2
0
40
45
50
55
60
More
42
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
40 Specimens
Box Plot
The lower quartile 49.45
The median is 53.03
The mean 52.6
The upper quartile 55.3
The interquartile range is 5.86
lower
extreme
40
median upper
lower
mean
quartile
quartile
45
50
55
upper
extreme
60
65
43
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
40 Specimens
Normal Probability Plot
99.90%
99%
95%
90%
80%
70%
60%
50%
40%
30%
20%
10%
5%
1%
0.10%
40
45
50
55
60
65
44
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
40 Specimens
LogNormal Probability Plot
99.90%
99%
95%
90%
80%
70%
60%
50%
40%
30%
20%
10%
5%
1%
0.10%
10
100
45
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
40 Specimens
Weibull Probability Plot
99.90%
99%
95%
90%
80%
70%
60%
50%
40%
30%
20%
10%
5%
3%
2%
1%
0.50%
0.30%
0.20%
0.10%
41
44
48
52
56
61
46
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08
40 Specimens
The tensile strength distribution can be
estimated by X ~ Nμ̂ 52.62, ˆ 4.45
1
^
F(x)
0.8
0.6
0.4
^
f(x)
0.2
0
49
50
51
52
53
54
55
47
Stracener_EMIS 7370/STAT 5340_Sum 08_07.03.08