PPT 03 - halsnarr

Download Report

Transcript PPT 03 - halsnarr

Measures of Location
The population mean of a data set is the average of
all the data values.
x


Sum of the values
of the N observations
i
N
Number of
observations in
the population
Measures of Location
The population mean of a data set is the average of
all the data values.
x


i
N
The sample mean is the point estimator of the
population mean .
Sum of the values
x

x
of the n observations
i
n
Number of
observations
in the sample
Measures of Location
Example: Recall the Hudson Auto Repair example
The manager of Hudson Auto would like to have better understanding
of the cost of parts used in the engine tune-ups performed in the shop.
She examines 50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed below.
91
78
93
57
75
52
99
80
97
62
71
69
72
89
66
75
79
75
72
76
104
74
62
68
97
105
77
65
80
109
85
97
88
68
83
68
71
69
67
74
62
82
98
101
79
105
79
69
62
73
3949
50
78.98
Measures of Location
For an odd number of observations:
26
18
27
12 14
27
19
7 observations
in ascending order
the median is the middle value.
Measures of Location
For an even number of observations:
26
18
27
12
14
27
19 30
8 observations
in ascending order
the median is the average of the middle two values.
Median = (19 + 26)/2 = 22.5
Measures of Location
Example: Hudson Auto Repair
Averaging the 25th and 26th data values:
Median = (75 + 76)/2 =
75.5
52
57
62
62
62
62
65
66
67
68
68
68
69
69
69
71
71
72
72
73
74
74
75
75
75
76
77
78
79
79
79
80
80
82
83
85
88
89
91
93
97
97
97
98
99
101 104 105 105 109
Note: Data is in ascending order.
Measures of Location
Example: Hudson Auto Repair
Mode = 62
52
57
62
62
62
62
65
66
67
68
68
74
68
74
69
75
69
75
69
75
71
76
71
77
72
78
72
79
73
79
79
80
80
82
83
85
88
89
91
93
97
97
97
98
99
101 104 105 105 109
Note: Data is in ascending order.
Measures of Location
Example: Hudson Auto Repair
First quartile = 25th percentile
ith = (p/100)n =(25/100)50 = 12.5 = 13th
First quartile = 69
52
57
62
62
62
62
65
66
67
68
68
68
69
69
69
71
71
72
72
73
74
74
75
75
75
76
77
78
79
79
79
80
80
82
83
85
88
89
91
93
97
97
97
98
99
101 104 105 105 109
Note: Data is in ascending order.
Measures of Location
Example: Hudson Auto Repair
ith = (p/100)n =(80/100)50 = 40th
Average the 40th and 41st data values
80th Percentile = (93 + 97)/2 = 95
52
57
62
62
62
62
65
66
67
68
68
74
68
74
69
75
69
75
69
75
71
76
71
77
72
78
72
79
73
79
79
80
80
82
83
85
88
89
91
93
97
97
97
98
99
101 104 105 105 109
Note: Data is in ascending order.
Measures of Location
Example: Hudson Auto Repair:
80th Percentile
95
52
57
62
62
62
62
65
66
67
68
68
74
68
74
69
75
69
75
69
75
71
76
71
77
72
78
72
79
73
79
79
80
80
82
83
85
88
89
91
93
97
97
97
98
99
101 104 105 105 109
Note: Data is in ascending order.
Pelican Stores -- continued
Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which
discount coupons were set to customers of other National Clothing stores. Data collected
for a sample of 100 in-store credit card transactions at Pelican Stores during one day
while the promotion was running are shown in Table 2.18. Customers who made a
purchase using a discount coupon are referred to as promotional customers and
customers who made a purchase but did not use a discount coupon are referred to as
regular customers. Because the promotional coupons were not set to regular Pelican
Stores customers, management considers the sales made to people presenting the
promotional coupons as sales it would not otherwise make.
Pelican’s management would like to use this sample data to learn about its
customer base and to evaluate the promotion involving discounts.
Managerial Report
1.Using graphs and tables, summarize the qualitative variables.
2.Using graphs and tables, summarize the quantitative variables.
3.Using pivot tables and scatter plots, summarize the variables.
4.Compute the mean, mode, median, and the 25th and 75th percentiles.
data_pelican.xls
Measures of Variability
Example: Hudson Auto Repair
Range = maximum – minimum
Range = 109 – 52 = 57
52
57
62
62
62
62
65
66
67
68
68
74
68
74
69
75
69
75
69
75
71
76
71
77
72
78
72
79
73
79
79
80
80
82
83
85
88
89
91
93
97
97
97
98
99
101 104 105 105 109
Note: Data is in ascending order.
Measures of Variability
Example: Hudson Auto Repair
3rd Quartile (Q3) = 89
1st Quartile (Q1) = 69
Interquartile Range = Q3 – Q1 = 89 – 69 = 20
52
57
62
62
62
62
65
66
67
68
68
74
68
74
69
75
69
75
69
75
71
76
71
77
72
78
72
79
73
79
79
80
80
82
83
85
88
89
91
93
97
97
97
98
99
101 104 105 105 109
Note: Data is in ascending order.
Measures of Variability
The population variance is the average variation
2 
2
(
x


)
 i
N
The population
mean
Measures of Variability
The population variance is the average variation
2 
2
(
x


)
 i
N
i th deviation
from the population
mean
Measures of Variability
The population variance is the average variation
2 
2
(
x


)
 i
N
i th squared
deviation from the
population mean
Measures of Variability
The population variance is the average variation
2 
2
(
x


)
 i
N
Sum of squared
deviations from
the population mean
Measures of Variability
The population variance is the average variation
2 
2
(
x


)
 i
N
Total variation of x
Measures of Variability
The population variance is the average variation
2 
2
(
x


)
 i
N
Number of
observations in
the population
Measures of Variability
The population variance is the average variation
2 
2
(
x


)
 i
N
The sample variance is an unbiased estimator of  2
s2 
2
(
x


)
 i
n
Number of
observations in
the sample
Measures of Variability
The population variance is the average variation
2 
2
(
x


)
 i
N
The sample variance is an unbiased estimator of  2
s 
2
2
(
x


x
)
 i
n
n
n 1
Measures of Variability
The population variance is the average variation
2 
2
(
x


)
 i
N
The sample variance is an unbiased estimator of  2
s2 
2
(
x

x
)
 i
n 1
Degrees
of freedom
Measures of Variability
s s
2
s


100

%
x

 
2


 100  %


Measures of Variability
x = 78.98
Sorted
invoices
Observed
value
Sqrd Dev from
the mean
1
52
727.92
2
57
483.12
3
62
288.32
4
62
288.32
5
62
288.32
6
62
288.32
7
65
195.44
49
105
677.04
50
109
901.20
Sum
3949
9592.98
Measures of Variability
Example: Hudson Auto Repair
Variance
s 
2
2
(
x

x
)
 i
n 1
9592.98

 195.78
50  1
Standard Deviation
s
s2 
195.78  13.992
Coefficient of variation
13.992
s

100%  17.72%
 100  % 
78.98
x

Pelican Stores -- continued
Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which
discount coupons were set to customers of other National Clothing stores. Data collected
for a sample of 100 in-store credit card transactions at Pelican Stores during one day
while the promotion was running are shown in Table 2.18. Customers who made a
purchase using a discount coupon are referred to as promotional customers and
customers who made a purchase but did not use a discount coupon are referred to as
regular customers. Because the promotional coupons were not set to regular Pelican
Stores customers, management considers the sales made to people presenting the
promotional coupons as sales it would not otherwise make.
Pelican’s management would like to use this sample data to learn about its
customer base and to evaluate the promotion involving discounts.
Managerial Report
1.Using graphs and tables, summarize the qualitative variables.
2.Using graphs and tables, summarize the quantitative variables.
3.Using pivot tables and scatter plots, summarize the variables.
4.Compute the mean, mode, median, and the 25th and 75th percentiles.
5.Compute the range, IQR, variance, and standard deviations.
data_pelican.xls
Measures of Shape
Example: Hudson Auto Repair
z-Score of Smallest Value
xi  x 52  78.98
z

 1.93
s
13.992
52
57
62
62
62
62
65
66
67
68
68
68
69
69
69
71
71
72
72
73
74
74
75
75
75
76
77
78
79
79
79
80
80
82
83
85
88
89
91
93
97
97
97
98
99
101 104 105 105 109
Note: Data is in ascending order.
Measures of Shape
x = 78.98
s = 13.992
Observed
value
Dev from the
mean
z-score
52
-26.98
-1.93
57
-21.98
-1.57
62
-16.98
-1.21
62
-16.98
-1.21
62
-16.98
-1.21
62
-16.98
-1.21
65
-13.98
-1.00
105
26.02
1.86
109
30.02
2.15
3949
0
0
Measures of Shape
An important measure of the shape of a distribution is called
skewness.
skew 
n zi 3
(n  1)(n  2)
It is just the average of the n cubed z-scores when n is “large”
skew 
3
z
i
n
Measures of Shape
Observed
value
z-score
cubed
z-score
52
-1.93
-7.17
57
-1.57
-3.88
62
-1.21
-1.79
62
-1.21
-1.79
62
-1.21
-1.79
62
-1.21
-1.79
65
-1.00
-1.00
105
1.86
6.43
109
2.15
9.88
3949
0
22.567
Measures of Shape
(n) zi 3
(50)(22.567)
skew 

 0.4797
(n  1)(n  2)
(49)(48)
Tune-up Parts Cost
18
16
Frequency
14
12
10
8
6
4
2
50
60
$62
70
80
$75.50
$78.98
90
100
110 Parts
Cost ($)
Measures of Shape
Symmetric
Moderately Skewed Left
skew = 0
skew =  .31
Highly Skewed Right
skew = 1.25
Measures of Shape
Chebyshev's Theorem:
At least (1 - 1/z2) of the data values are within z
standard deviations of the mean.
At least 0% of the data values are within 1
standard deviation of the mean
At least 75% of the data values are within 2
standard deviations of the mean
At least 89% of the data values are within 3
standard deviations of the mean
At least 94% of the data values are within 4
standard deviations of the mean
Measures of Shape
Empirical Rule:
68.26% of the data values are within 1
standard deviation of the mean
95.44% of the data values are within 2
standard deviations of the mean
99.74% of the data values are within 3
standard deviations of the mean
99.99% of the data values are within 4
standard deviations of the mean
Measures of Shape
z-score
Is the observation
within 2 std dev?
-1.93
Yes
-1.57
Yes
-1.21
Yes
-1.21
Yes
-1.21
Yes
-1.21
Yes
-1.00
Yes
1.86
Yes
2.15
No
49 of the 50 data values are within 2 s of the mean = 98%
50 of the 50 data values are within 3 s of the mean = 100%
None of the values are outliers
Pelican Stores -- continued
Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which
discount coupons were set to customers of other National Clothing stores. Data collected
for a sample of 100 in-store credit card transactions at Pelican Stores during one day
while the promotion was running are shown in Table 2.18. Customers who made a
purchase using a discount coupon are referred to as promotional customers and
customers who made a purchase but did not use a discount coupon are referred to as
regular customers. Because the promotional coupons were not set to regular Pelican
Stores customers, management considers the sales made to people presenting the
promotional coupons as sales it would not otherwise make.
Pelican’s management would like to use this sample data to learn about its
customer base and to evaluate the promotion involving discounts.
Managerial Report
1.Using graphs and tables, summarize the qualitative variables.
2.Using graphs and tables, summarize the quantitative variables.
3.Using pivot tables and scatter plots, summarize the variables.
4.Compute the mean, mode, median, and the 25th and 75th percentiles.
5.Compute the range, IQR, variance, and standard deviations.
6.Compute the z-scores and skew, find the outliers, and count the observations
that are within 1, 2, & 3 standard deviations of the mean.
data_pelican.xls
Measures of the relationship between 2 variables
The covariance is computed as follows:
sxy
( x  x )( y  y )


i
i
n 1
(for samples)
 xy
( x   )( y  


i
x
i
N
(for populations)
y
)
Measures of the relationship between 2 variables
The covariance is computed as follows:
sxy
( x  x )( y  y )


i
i
n 1
(for samples)
i th deviation
from x’s means
 xy
( x   )( y  


i
x
i
N
(for populations)
y
)
Measures of the relationship between 2 variables
The covariance is computed as follows:
sxy
( x  x )( y  y )


i
i
n 1
(for samples)
i th deviation
from y’s means
 xy
( x   )( y  


i
x
i
N
(for populations)
y
)
Measures of the relationship between 2 variables
The covariance is computed as follows:
sxy
( x  x )( y  y )


i
i
n 1
(for samples)
 xy
( x   )( y  


i
x
i
N
(for populations)
The sizes of the
sample and
population
y
)
Measures of the relationship between 2 variables
The covariance is computed as follows:
sxy
( x  x )( y  y )


i
i
n 1
(for samples)
Degrees of
freedom
 xy
( x   )( y  


i
x
i
N
(for populations)
y
)
Measures of the relationship between 2 variables
The covariance is computed as follows:
sxy
( x  x )( y  y )


 xy 
i
i
n 1
 ( xi   x )( yi   y )
N
rxy 
s xy
sx s y
 xy
 xy 
 x y
Measures of the relationship between 2 variables
Example: Reed Auto Sales
Reed Auto periodically has a special week-long sale. As
part of the advertising campaign Reed runs one or more
television commercials during the weekend preceding the sale.
Data from a sample of 5 previous sales are shown below.
Number of TV Ads Number of Cars Sold
(x)
(y)
1
14
3
24
2
18
1
17
3
27
Measures of the relationship between 2 variables
Example: Reed Auto Sales
Cars sold
35
30
25
20
15
10
5
0
0
1
2
TV Ads
3
4
Measures of the relationship between 2 variables
Example: Reed Auto Sales
x
y
1 14
3 24
2 18
1 17
3 27
10 . 100 .
5
5
x = 2 y = 20
(ads) (cars)
x–x
(x – x)2
y–y
12
32
22
12
32
1
1
0
1
1
4.
4
sxx = 1
14  20
24  20
18  20
17  20
27  20
(ads squared)
(y – y)2 (x – x)(y – y)
36
16
4
9
49
114 .
4
syy= 28.5
(cars squared)
sx = 1
sy = 5.34
(ads)
(cars)
6
4
0
3
7
20 .
4
sxy = 5
(ads-cars)
Measures of the relationship between 2 variables
Example: Reed Auto Sales
sxy = 5
(ads-cars)
sx = 1
sy = 5.34
(ads)
(cars)
sxy
(ads-cars)
5
rxy 

 .9363
sx  s y 1 5.34 (ads) (cars)
Pelican Stores -- continued
Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which
discount coupons were set to customers of other National Clothing stores. Data collected
for a sample of 100 in-store credit card transactions at Pelican Stores during one day
while the promotion was running are shown in Table 2.18. Customers who made a
purchase using a discount coupon are referred to as promotional customers and
customers who made a purchase but did not use a discount coupon are referred to as
regular customers. Because the promotional coupons were not set to regular Pelican
Stores customers, management considers the sales made to people presenting the
promotional coupons as sales it would not otherwise make.
Pelican’s management would like to use this sample data to learn about its
customer base and to evaluate the promotion involving discounts.
Managerial Report
1.Using graphs and tables, summarize the qualitative variables.
2.Using graphs and tables, summarize the quantitative variables.
3.Using pivot tables and scatter plots, summarize the variables.
4.Compute the mean, mode, median, and the 25th and 75th percentiles.
5.Compute the range, IQR, variance, and standard deviations.
6.Compute the z-scores and skew, find the outliers, and count the observations
that are within 1, 2, & 3 standard deviations of the mean.
7.Compute the covariances and correlations.
data_pelican.xls