Review: Numerical Summaries

Download Report

Transcript Review: Numerical Summaries

Numerical Summaries of
Quantitative Data.
Means, Standard Deviations,
z-scores
Warmup



Six people in a room have a median
age of 45 years and mean age of 45
years.
One person who is 40 years old leaves
the room.
Questions:
1. What is the median age of the 5 people
remaining in the room?
2. What is the mean age of the 5 people
remaining in the room?
2 characteristics of a data set
to measure


center
measures where the “middle” of the
data is located
variability
measures how “spread out” the data is
Measure of the “middle”
Sample mean x
n
x
x1  x2  x3  xn i 1
x

n
n
i
Population mean  (value typically not known)
N = population size
N
x
  i 1
N
i
Recall: Warmup
456=270; 270-40=230; 230/5=46



Six people in a room have a median
age of 45 years and mean age of 45
years.
One person who is 40 years old leaves
the room.
Questions:
1. What is the median age of the 5 people
remaining in the room? Can’t answer
2. What is the mean age of the 5 people
remaining in the room? 46
Connection Between Mean
and Histogram
A histogram balances when supported
at the mean. Mean x = 140.6
Histogram
70
60
50
40
Fr equency
30
20
10
Abs e nce s f rom Work
More
1 60.5
153.5
146.5
139 .5
132.5
125.5
0
118.5
Fre que ncy

Mean: balance point
Median: 50% area each half
right histo: mean 55.26 yrs, median 57.7yrs
Mean, Median, Maximum
Baseball Salaries 1985 - 2014
Baseball Salaries: Mean, Median and Maximum 1985-2014
Mean
Median
Maximum
35,000,000
3,200,000
25,000,000
2,700,000
20,000,000
2,200,000
15,000,000
1,700,000
10,000,000
1,200,000
Year
2013
2011
2009
2007
2005
2003
2001
1999
1997
1995
1993
0
1991
200,000
1989
5,000,000
1987
700,000
Maximum Salary
30,000,000
1985
Mean, Median Salary
3,700,000
DESCRIBING VARIABILITY OF
QUANTITATIVE DATA
The Sample Standard Deviation, a
measure of spread around the mean

Square the deviation of each
observation from the mean; find the
square root of the “average” of these
squared deviations
n
( x i  x ) ;  ( x i  x ) 2 and find the " average" ,
2
i 1
then take the square root of the average
n
s 
 (x
i 1
deviation
i
 x )2
n 1
called the sample standard
Calculations …
Women height (inches)
i
xi
x
(xi-x)
(xi-x)2
1
59
63.4
-4.4
19.0
2
60
63.4
-3.4
11.3
3
61
63.4
-2.4
5.6
4
62
63.4
-1.4
1.8
5
62
63.4
-1.4
1.8
6
63
63.4
-0.4
0.1
7
63
63.4
-0.4
0.1
8
63
63.4
-0.4
0.1
9
64
63.4
0.6
0.4
10
64
63.4
0.6
0.4
11
65
63.4
1.6
2.7
12
66
63.4
2.6
7.0
13
67
63.4
3.6
13.3
14
68
63.4
4.6
21.6
Sum
0.0
Sum
85.2
Mean
63.4
x
Mean = 63.4
Sum of squared deviations from
mean = 85.2
(n − 1) = 13; (n − 1) is called degrees
freedom (df)
s2 = variance = 85.2/13 = 6.55
i
xi
x
(xi-x)
(xi-x)2
1
59
63.4
-4.4
19.0
2
60
63.4
-3.4
11.3
3
61
63.4
-2.4
5.6
4
62
63.4
-1.4
1.8
We’ll
never
calculate
these by hand, so make sure to
5
62
63.4
-1.4
know
how
to get
the1.8standard deviation using your
6
63
63.4
-0.4
0.1
calculator
or
software.
7
63
63.4
-0.4
0.1
x
8
63
63.4
-0.4
0.1
9
64
63.4
0.6
0.4
10
64
63.4
0.6
0.4
11
65
63.4
1.6
2.7
12
66
63.4
2.6
7.0
13
67
63.4
3.6
13.3
14
68
63.4
4.6
21.6
Sum
0.0
Sum
85.2
Mean
63.4
Mean
± 1 s.d.
2. Then take the square root to get the
1. First calculate the variance s2.
s2 
n
1
( xi  x ) 2

n 1 1
standard deviation s.
1 n
2
s
(
x

x
)

i
n 1 1
Population Standard Deviation
N
 
2
(
x


)
 i
i 1
N
value of 
population standard deviation
typically not known;
use s to estimate value of 
Remarks
1. The standard deviation of a set of
measurements is an estimate of the
likely size of the chance error in a
single measurement
Remarks (cont.)
2. Note that s and  are always greater
than or equal to zero.
3. The larger the value of s (or  ), the
greater the spread of the data.
When does s=0? When does  =0?
When all data values are
the same.
Remarks (cont.)
4. The standard deviation is the most
commonly used measure of risk in
finance and business
– Stocks, Mutual Funds, etc.
5. Variance




s2 sample variance
 2 population variance
Units are squared units of the original data
square $, square gallons ??
Remarks 6):Why divide by n-1
instead of n?



degrees of freedom
each observation has 1 degree of
freedom
however, when estimate unknown
population parameter like , you lose 1
degree of freedom
In formula for s , we use x to estimate the unkown
n
value of  ;
s 
2
(
x

x
)
 i
i 1
n 1
Remarks 6) (cont.):Why divide by
n-1 instead of n? Example



Suppose we have 3 numbers whose
average is 9
Choose ANY values for x1 and x2
x1=
x2= Since the average (mean)
is 9, x1 + x2 + x3 must
then x3 must be
equal 9*3 = 27, so x3 = 27
– (x1 + x2)


once we selected x1 and x2, x3 was
determined since the average was 9
3 numbers but only 2 “degrees of
freedom”
class pulse rates
53 64 67 67 70 76 77 77 78 83 84 85 85 89 90
90 90 90 91 96 98 103 140
n  23 x  84.48 m  85
s  290.26(beats per minute)
s  17.037 beats per minute
2
2
Review: Properties of s and 



s and  are always greater than or
equal to 0
when does s = 0?  = 0?
The larger the value of s (or ), the
greater the spread of the data
the standard deviation of a set of
measurements is an estimate of the
likely size of the chance error in a single
measurement
Summary of Notation
SAMPLE
y sample mean
POPULATION
 population mean
m sample median
m population median
s sample variance  2 population variance
s sample stand. dev.  population stand. dev.
2
Using the Mean and Standard
Deviation Together.
Z-scores: Standardized Data
Values
Measures the distance of a
number from the mean in units of
the standard deviation
z-score corresponding to y
y y
z
s
where
y  original data value
y  the sample mean
s  the sample standard deviation
z  the z-score corresponding to y
If data has mean y and standard deviation s,
then standardizing a particular value of y
indicates how many standard deviations y
is above or below the mean y .

Exam 1: y1 = 88, s1 = 6; exam 1 score: 91
Exam 2: y2 = 88, s2 = 10; exam 2 score: 92
Which score is better?
z1 
z2 
91  88
6
92  88


3
 .5
6
4
 .4
10
10
91 on exam 1 is better than 92 on exam 2
Comparing SAT and ACT
Scores
SAT Math: Eleanor’s score 680
SAT mean =500 sd=100
 ACT Math: Gerald’s score 27
ACT mean=18 sd=6
 Eleanor’s z-score: z=(680-500)/100=1.8
 Gerald’s z-score: z=(27-18)/6=1.5
 Eleanor’s score is better.

Z-scores add to zero
Student/Institutional Support to Athletic Depts For the 9 Public ACC
Schools: 2013 ($ millions)
School
Support
y - ybar
Z-score
Maryland
15.5
6.4
1.79
UVA
13.1
4.0
1.12
Louisville
10.9
1.8
0.50
UNC
9.2
0.1
0.03
VaTech
7.9
-1.2
-0.34
FSU
7.9
-1.2
-0.34
GaTech
7.1
-2.0
-0.56
NCSU
6.5
-2.6
-0.73
Clemson
3.8
-5.3
-1.47
Mean=9.1000,
s=3.5697
Sum = 0
Sum = 0
In a recent year the mean tuition at 4-yr public
colleges/universities in the U.S. was $6185 with a
standard deviation of $1804. In NC the tuition
was $4320. What is NC’s z-score?
1.
2.
3.
4.
5.
1.03
-1.03
2.39
1865
-1865
End of Numerical Summaries