Chapters 4, 5 (cont.) Symmetric Data
Download
Report
Transcript Chapters 4, 5 (cont.) Symmetric Data
Chapter 4 (cont.)
Numerical Summaries of
Symmetric Data.
Measure of Center: Mean
Measure of Variability: Standard
Deviation
Symmetric Data
Body temp. of 93 adults
Recall: 2 characteristics of a
data set to measure
center
measures where the “middle” of the
data is located
variability
measures how “spread out” the data is
Measure of Center When Data
Approx. Symmetric
mean (arithmetic mean)
notation
xi : ith measurement in a set of observations
x1 , x2 , x3 , , xn
n: number of measurements in data set; sample
size
n
xi x1 x2 x3 xn
i 1
Sample mean x
n
x
x1 x2 x3 xn i 1
x
n
n
i
Population mean (value typically not known)
N = population size
N
x
i 1
N
i
Connection Between Mean
and Histogram
A histogram balances when supported
at the mean. Mean x = 140.6
Histogram
70
60
50
40
Fr equency
30
20
10
Abs e nce s f rom Work
More
1 60.5
153.5
146.5
139 .5
132.5
125.5
0
118.5
Fre que ncy
Mean: balance point
Median: 50% area each half
right histo: mean 55.26 yrs, median 57.7yrs
Properties of Mean, Median
1. The mean and median are unique; that is, a
data set has only 1 mean and 1 median (the
mean and median are not necessarily equal).
2. The mean uses the value of every number in
the data set; the median does not.
20
46
Ex. 2, 4, 6, 8. x 5; m
5
4
2
21 1
46
Ex. 2, 4, 6, 9. x 5 4 ; m
5
4
2
Example: class pulse rates
53 64 67 67 70 76 77 77 78 83 84 85 85
89 90 90 90 90 91 96 98 103 140
n 23
23
x
x
i 1
i
84.48;
23
m :location: 12th obs. m 85
2012-13 NFL salaries, 2014
MLB salaries
2012-2013 NFL
n = 1532
= $1,579,693
median = $615,000
max = $18,000,000
2014 MLB
n = 856
= $3,932,912
median = $1,456,250
max = $28,000,000
Disadvantage of the mean
Can be greatly influenced by just a few
observations that are much greater or
much smaller than the rest of the data
Mean, Median, Maximum
Baseball Salaries 1985 - 2014
Baseball Salaries: Mean, Median and Maximum 1985-2014
Mean
Median
Maximum
35,000,000
3,200,000
25,000,000
2,700,000
20,000,000
2,200,000
15,000,000
1,700,000
10,000,000
1,200,000
Year
2013
2011
2009
2007
2005
2003
2001
1999
1997
1995
1993
0
1991
200,000
1989
5,000,000
1987
700,000
Maximum Salary
30,000,000
1985
Mean, Median Salary
3,700,000
Skewness: comparing the
mean, and median
Skewed to the right (positively skewed)
mean>median
2013 MLB Salaries
450
419
400
Frequency
350
300
250
200
150
99
100
50
72
24
33
29
28
16
12
0
2013 Salary ($1,000)
7
8
4
2
1
Skewed to the left; negatively
skewed
Mean < median
mean=78; median=87;
Histogram of Exam Scores
Frequency
30
20
10
0
20
30
40
50 60 70 80
Exam Scores
90 100
Symmetric data
mean, median approx. equal
Bank Customers: 10:00-11:00 am
20
15
10
5
0
70
.8
78
.6
86
.4
94
.2
10
2
10
9.
8
11
7.
6
12
5.
4
13
3.
2
m
or
e
Frequency
Number of Customers
DESCRIBING VARIABILITY OF
SYMMETRIC DATA
Describing Symmetric Data
(cont.)
Measure of center for symmetric data:
Sample mean x
n
x1 x2 x3
x
n
xn
x
i 1
i
n
Measure of variability for symmetric
data?
Example
2 data sets:
x1=49, x2=51 x=50
y1=0, y2=100 y=50
On average, they’re both
comfortable
49 51
0 100
Ways to measure variability
range=largest-smallest
ok sometimes; in general, too crude;
sensitive to one large or small obs.
1.
2. measure spread from the middle, where
the middle is the mean x ;
deviation of xi from the mean: xi x
n
( x x ); sum the deviations of all the x 's from x ;
i 1
i
i
n
( x x ) 0 always; tells us nothing
i 1
i
Previous Example
sum of deviations from mean:
x1 49, x2 51; x 50
( x1 x ) ( x2 x ) (49 50) (51 50) 1 1 0;
y1 0, y2 100; y 50
( y1 y ) ( y2 y ) (0 50) (100 50) 50 50 0
The Sample Standard Deviation, a
measure of spread around the mean
Square the deviation of each
observation from the mean; find the
square root of the “average” of these
squared deviations
n
( x i x ) ; ( x i x ) 2 and find the " average" ,
2
i 1
then take the square root of the average
n
s
(x
i 1
deviation
i
x )2
n 1
called the sample standard
Calculations …
Women height (inches)
i
xi
x
(xi-x)
(xi-x)2
1
59
63.4
-4.4
19.0
2
60
63.4
-3.4
11.3
3
61
63.4
-2.4
5.6
4
62
63.4
-1.4
1.8
5
62
63.4
-1.4
1.8
6
63
63.4
-0.4
0.1
7
63
63.4
-0.4
0.1
8
63
63.4
-0.4
0.1
9
64
63.4
0.6
0.4
10
64
63.4
0.6
0.4
11
65
63.4
1.6
2.7
12
66
63.4
2.6
7.0
13
67
63.4
3.6
13.3
14
68
63.4
4.6
21.6
Mean = 63.4
Sum
0.0
Sum
85.2
Sum of squared deviations from mean = 85.2
Mean
63.4
x
(n − 1) = 13; (n − 1) is called degrees freedom (df)
s2 = variance = 85.2/13 = 6.55 inches squared
s = standard deviation = √6.55 = 2.56 inches
i
xi
x
(xi-x)
(xi-x)2
1
59
63.4
-4.4
19.0
2
60
63.4
-3.4
11.3
3
61
63.4
-2.4
5.6
4
62
63.4
-1.4 these
1.8by hand, so make sure to know how to get the
We’ll
never
calculate
standard
deviation
using your
calculator, Excel, or other software.
5
62
63.4
-1.4
1.8
6
63
63.4
-0.4
0.1
7
63
63.4
-0.4
0.1
8
63
63.4
-0.4
0.1
9
64
63.4
0.6
0.4
10
64
63.4
0.6
0.4
11
65
63.4
1.6
2.7
12
66
63.4
2.6
7.0
13
67
63.4
3.6
13.3
14
68
63.4
4.6
21.6
Sum
0.0
Sum
85.2
Mean
63.4
1. First calculate the variance s2.
n
1
s
( xi x ) 2
n 1 1
2
x
Mean
± 1 s.d.
2. Then take the square root to get the
standard deviation s.
1 n
2
s
(
x
x
)
i
n 1 1
Population Standard Deviation
N
2
(
x
)
i
i 1
N
value of
population standard deviation
typically not known;
use s to estimate value of
Remarks
1. The standard deviation of a set of
measurements is an estimate of the
likely size of the chance error in a
single measurement
Remarks (cont.)
2. Note that s and are always greater
than or equal to zero.
3. The larger the value of s (or ), the
greater the spread of the data.
When does s=0? When does =0?
When all data values are the same.
Remarks (cont.)
4. The standard deviation is the most
commonly used measure of risk in
finance and business
– Stocks, Mutual Funds, etc.
5. Variance
s2 sample variance
2 population variance
Units are squared units of the original data
square $, square gallons ??
Remarks 6):Why divide by n-1
instead of n?
degrees of freedom
each observation has 1 degree of
freedom
however, when estimate unknown
population parameter like , you lose 1
degree of freedom
In formula for s , we use x to estimate the unkown
n
value of ;
s
2
(
x
x
)
i
i 1
n 1
Remarks 6) (cont.):Why divide
by n-1 instead of n? Example
Suppose we have 3 numbers whose
average is 9
Choose ANY values for x and x
x1=
x2= Since the average (mean) is 9, x
x + x must equal 9*3 = 27, so x
then x3 must be
27 – (x + x )
once we selected x1 and x2, x3 was
determined since the average was 9
3 numbers but only 2 “degrees of
freedom”
1
2
+
3 =
1
2
3
1
2
Computational Example
observations 1, 3, 5, 9; x 184 4.5
(1 4.5) 2 (3 4.5) 2 (5 4.5) 2 (9 4.5) 2
s
4 1
(3.5) 2 (1.5) 2 (.5) 2 (4.5) 2
3
12.25 2.25 .25 20.25
35
11.67 3.42;
3
3
s 2 11.67
class pulse rates
53 64 67 67 70 76 77 77 78 83 84 85 85 89 90
90 90 90 91 96 98 103 140
n 23 x 84.48 m 85
s 290.26(beats per minute)
s 17.037 beats per minute
2
2
Review: Properties of s and
s and are always greater than or
equal to 0
when does s = 0? = 0?
The larger the value of s (or ), the
greater the spread of the data
the standard deviation of a set of
measurements is an estimate of the
likely size of the chance error in a single
measurement
Summary of Notation
SAMPLE
y sample mean
POPULATION
population mean
m sample median
m population median
s sample variance 2 population variance
s sample stand. dev. population stand. dev.
2
End of Chapters 4 and 5