Chapters 4, 5 (cont.) Symmetric Data

Download Report

Transcript Chapters 4, 5 (cont.) Symmetric Data

Chapter 4 (cont.)
Numerical Summaries of
Symmetric Data.
Measure of Center: Mean
Measure of Variability: Standard
Deviation
Symmetric Data
Body temp. of 93 adults
Recall: 2 characteristics of a
data set to measure
center
measures where the “middle” of the
data is located
 variability
measures how “spread out” the data is

Measure of Center When Data
Approx. Symmetric
mean (arithmetic mean)
 notation
xi : ith measurement in a set of observations
x1 , x2 , x3 , , xn
n: number of measurements in data set; sample
size

n
 xi  x1  x2  x3    xn
i 1
Sample mean x
n
x
x1  x2  x3  xn i 1
x

n
n
i
Population mean  (value typically not known)
N = population size
N
x
  i 1
N
i
Connection Between Mean
and Histogram
A histogram balances when supported
at the mean. Mean x = 140.6
Histogram
70
60
50
40
Fr equency
30
20
10
Abs e nce s f rom Work
More
1 60.5
153.5
146.5
139 .5
132.5
125.5
0
118.5
Fre que ncy

Mean: balance point
Median: 50% area each half
right histo: mean 55.26 yrs, median 57.7yrs
Properties of Mean, Median
1. The mean and median are unique; that is, a
data set has only 1 mean and 1 median (the
mean and median are not necessarily equal).
2. The mean uses the value of every number in
the data set; the median does not.
20
46
Ex. 2, 4, 6, 8. x   5; m 
5
4
2
21 1
46
Ex. 2, 4, 6, 9. x   5 4 ; m 
5
4
2
Example: class pulse rates

53 64 67 67 70 76 77 77 78 83 84 85 85
89 90 90 90 90 91 96 98 103 140
n  23
23
x 
x
i 1
i
 84.48;
23
m :location: 12th obs. m  85
2012-13 NFL salaries, 2014
MLB salaries
2012-2013 NFL
n = 1532
 = $1,579,693
median = $615,000
max = $18,000,000

2014 MLB
n = 856
 = $3,932,912
median = $1,456,250
max = $28,000,000

Disadvantage of the mean

Can be greatly influenced by just a few
observations that are much greater or
much smaller than the rest of the data
Mean, Median, Maximum
Baseball Salaries 1985 - 2014
Baseball Salaries: Mean, Median and Maximum 1985-2014
Mean
Median
Maximum
35,000,000
3,200,000
25,000,000
2,700,000
20,000,000
2,200,000
15,000,000
1,700,000
10,000,000
1,200,000
Year
2013
2011
2009
2007
2005
2003
2001
1999
1997
1995
1993
0
1991
200,000
1989
5,000,000
1987
700,000
Maximum Salary
30,000,000
1985
Mean, Median Salary
3,700,000
Skewness: comparing the
mean, and median
Skewed to the right (positively skewed)
 mean>median

2013 MLB Salaries
450
419
400
Frequency
350
300
250
200
150
99
100
50
72
24
33
29
28
16
12
0
2013 Salary ($1,000)
7
8
4
2
1
Skewed to the left; negatively
skewed
Mean < median
 mean=78; median=87;

Histogram of Exam Scores
Frequency
30
20
10
0
20
30
40
50 60 70 80
Exam Scores
90 100
Symmetric data
mean, median approx. equal
Bank Customers: 10:00-11:00 am
20
15
10
5
0
70
.8
78
.6
86
.4
94
.2
10
2
10
9.
8
11
7.
6
12
5.
4
13
3.
2
m
or
e
Frequency

Number of Customers
DESCRIBING VARIABILITY OF
SYMMETRIC DATA
Describing Symmetric Data
(cont.)

Measure of center for symmetric data:
Sample mean x
n
x1  x2  x3 
x
n

 xn

x
i 1
i
n
Measure of variability for symmetric
data?
Example

2 data sets:
x1=49, x2=51 x=50
y1=0, y2=100 y=50
On average, they’re both
comfortable
49 51
0 100
Ways to measure variability
range=largest-smallest
ok sometimes; in general, too crude;
sensitive to one large or small obs.
1.
2. measure spread from the middle, where
the middle is the mean x ;
 deviation of xi from the mean: xi  x

n
 ( x  x ); sum the deviations of all the x 's from x ;
i 1
i
i
n
 ( x  x )  0 always; tells us nothing
i 1
i
Previous Example
sum of deviations from mean:
x1  49, x2  51; x  50 
( x1  x )  ( x2  x )  (49  50)  (51  50)  1  1  0;
y1  0, y2  100; y  50 
( y1  y )  ( y2  y )  (0  50)  (100  50)  50  50  0
The Sample Standard Deviation, a
measure of spread around the mean

Square the deviation of each
observation from the mean; find the
square root of the “average” of these
squared deviations
n
( x i  x ) ;  ( x i  x ) 2 and find the " average" ,
2
i 1
then take the square root of the average
n
s 
 (x
i 1
deviation
i
 x )2
n 1
called the sample standard
Calculations …
Women height (inches)
i
xi
x
(xi-x)
(xi-x)2
1
59
63.4
-4.4
19.0
2
60
63.4
-3.4
11.3
3
61
63.4
-2.4
5.6
4
62
63.4
-1.4
1.8
5
62
63.4
-1.4
1.8
6
63
63.4
-0.4
0.1
7
63
63.4
-0.4
0.1
8
63
63.4
-0.4
0.1
9
64
63.4
0.6
0.4
10
64
63.4
0.6
0.4
11
65
63.4
1.6
2.7
12
66
63.4
2.6
7.0
13
67
63.4
3.6
13.3
14
68
63.4
4.6
21.6
Mean = 63.4
Sum
0.0
Sum
85.2
Sum of squared deviations from mean = 85.2
Mean
63.4
x
(n − 1) = 13; (n − 1) is called degrees freedom (df)
s2 = variance = 85.2/13 = 6.55 inches squared
s = standard deviation = √6.55 = 2.56 inches
i
xi
x
(xi-x)
(xi-x)2
1
59
63.4
-4.4
19.0
2
60
63.4
-3.4
11.3
3
61
63.4
-2.4
5.6
4
62
63.4
-1.4 these
1.8by hand, so make sure to know how to get the
We’ll
never
calculate
standard
deviation
using your
calculator, Excel, or other software.
5
62
63.4
-1.4
1.8
6
63
63.4
-0.4
0.1
7
63
63.4
-0.4
0.1
8
63
63.4
-0.4
0.1
9
64
63.4
0.6
0.4
10
64
63.4
0.6
0.4
11
65
63.4
1.6
2.7
12
66
63.4
2.6
7.0
13
67
63.4
3.6
13.3
14
68
63.4
4.6
21.6
Sum
0.0
Sum
85.2
Mean
63.4
1. First calculate the variance s2.
n
1
s 
( xi  x ) 2

n 1 1
2
x
Mean
± 1 s.d.
2. Then take the square root to get the
standard deviation s.
1 n
2
s
(
x

x
)

i
n 1 1
Population Standard Deviation
N
 
2
(
x


)
 i
i 1
N
value of 
population standard deviation
typically not known;
use s to estimate value of 
Remarks
1. The standard deviation of a set of
measurements is an estimate of the
likely size of the chance error in a
single measurement
Remarks (cont.)
2. Note that s and  are always greater
than or equal to zero.
3. The larger the value of s (or  ), the
greater the spread of the data.
When does s=0? When does  =0?
When all data values are the same.
Remarks (cont.)
4. The standard deviation is the most
commonly used measure of risk in
finance and business
– Stocks, Mutual Funds, etc.
5. Variance




s2 sample variance
 2 population variance
Units are squared units of the original data
square $, square gallons ??
Remarks 6):Why divide by n-1
instead of n?
degrees of freedom
 each observation has 1 degree of
freedom
 however, when estimate unknown
population parameter like , you lose 1
degree of freedom

In formula for s , we use x to estimate the unkown
n
value of  ;
s 
2
(
x

x
)
 i
i 1
n 1
Remarks 6) (cont.):Why divide
by n-1 instead of n? Example
Suppose we have 3 numbers whose
average is 9
Choose ANY values for x and x
 x1=
x2= Since the average (mean) is 9, x
x + x must equal 9*3 = 27, so x
 then x3 must be
27 – (x + x )
 once we selected x1 and x2, x3 was
determined since the average was 9
 3 numbers but only 2 “degrees of
freedom”

1
2
+
3 =
1
2
3
1
2
Computational Example
observations 1, 3, 5, 9; x  184 4.5
(1  4.5) 2  (3  4.5) 2  (5  4.5) 2  (9  4.5) 2
s 
4 1
(3.5) 2  (1.5) 2  (.5) 2  (4.5) 2

3
12.25  2.25  .25  20.25
35


 11.67 3.42;
3
3
s 2 11.67
class pulse rates
53 64 67 67 70 76 77 77 78 83 84 85 85 89 90
90 90 90 91 96 98 103 140
n  23 x  84.48 m  85
s  290.26(beats per minute)
s  17.037 beats per minute
2
2
Review: Properties of s and 
s and  are always greater than or
equal to 0
when does s = 0?  = 0?
 The larger the value of s (or ), the
greater the spread of the data
 the standard deviation of a set of
measurements is an estimate of the
likely size of the chance error in a single
measurement

Summary of Notation
SAMPLE
y sample mean
POPULATION
 population mean
m sample median
m population median
s sample variance  2 population variance
s sample stand. dev.  population stand. dev.
2
End of Chapters 4 and 5