Measures of Central Tendency

Download Report

Transcript Measures of Central Tendency

Descriptive Statistics
the everyday notions of
central tendency









Usual
Customary
Most
Standard
Expected
normal
Ordinary
Medium
commonplace
NY Times, 10/24/ 2010
Stories vs. Statistics
By JOHN ALLEN PAULOS
Overview

What are descriptive statistics?


A bit of terminology/notation
Measures of Central Tendency


Measures of Variability


Mean, Mode, Median
Ranges, Standard Deviations
The Normal Curve
Terminology/Notation

A data distribution = A set of data/scores
(the whole thing)



X = A raw, single score (i.e., 2 from above)
∑ = Summation (added up)


1, 2, 4, 7
∑X = 14 (each individual score added up)
n = sample size (distribution size, or
number of scores)

n = 4 (from above)
Descriptive Statistics


Descriptive statistics are the side of
statistics we most often use in our
everyday lives
Realize that most observations/data are
too “large” for a human to take in and
comprehend – we must “reduce” them


How can we summarize what we see?
Example – Grades/Registrar
Descriptive Statistics

Descriptive statistics = describing the
data

n = 50, a test score of 83%

Where does it fit in the class??
Making sense out of chaos
Descriptive Statistics

Transform a set of numbers or
observations into indices that describe
or characterize the data


“Summary statistics”
A large group of statistics that are used in
all research manuscripts

Even the most complex statistical tests and
studies start with descriptive statistics
Descriptive Statistics
Measurement
Scales
•
•
•
•
Nominal
Ordinal
Interval
Ratio
Graphic
Portrayals
•
•
•
•
Frequencies
Histograms
Bar graphs
Normal distribution
Relationship
Descriptive
Statistics
Central
Tendency
• Mean
• Median
• Mode
• Scatterplot
• Correlation
• Regression
Variability
• Range
• Standard deviation
• Standardized scores
Descriptive Statistics

Descriptive statistics usually accomplish
two major goals:



1) Describe the central location of the data
2) Describe how the data are dispersed
about that point
In other words, they provide:


1) Measures of Central Tendency
2) Measures of Variability
Measure of Central Tendency

What SINGLE summary value best
describes the CENTRAL location of an
entire distribution?



Mode: which value occurs most often
Median: the value above and below which
50% of the cases fall (the middle; 50th
percentile)
Mean: mathematical balance point;
arithmetic/mathematical average
Mode


Most frequent occurrence
What if data were?




17, 19, 20, 20, 22, 23, 25, 28
17, 19, 20, 20, 22, 23, 23, 28
Problem: set of numbers can be
bimodal, or trimodal, depending on the
scores
Not a stable measure

Ex. 17, 19, 20, 22, 23, 28, 28
Median


Rank numbers, pick middle one
What if data were…?



17, 19, 20, 23, 23, 28
Solution: add up two middle scores, divide
by 2 (=21.5)
Best measure in asymmetrical distribution
(i.e. skewed), not sensitive to extreme scores

Ex. 17, 19, 20, 23, 23, 428
Mean = X


Add up the numbers and divide by the
sample size (the number of numbers!)
X
X
n
Try this one…



2,3,5,6,9
2+3+5+6+9 = 25 / 5 = 5
(Usually) best measure of the three –
uses the most information (all values
from distribution contribute)
Characteristics of the Mean

Balance point

Point around which deviations sum to zero



Deviation = X – X
For instance, if scores are 2,3,5,6,9
Mean is 5


Sum of deviations: (-3)+(-2)+0+1+4=0
∑ (X – X) = 0
Characteristics of the Mean

Affected by extreme scores

Example 1



Scores 7, 11, 11, 14, 17
Mean = 12, Mode and Median = 11
Example 2


Scores 7, 11, 11, 14, 170
Mean = 42.6, Mode & Median = 11
Characteristics of the Mean




Balance point
Affected by extreme scores
Appropriate for use with interval or ratio
scales of measurement
More stable than Median or Mode when
multiple samples drawn from the same
population

Basis for inferential stats
Guidelines to Choose Measure
of Central Tendency


Mean is preferred because it is the basis
of inferential statistics
Median may be better for skewed data


Distribution of wealth in the US – ex.
annual household income in Washington
state for 2000: mean=$76,818;
median=$42,024
Mode to describe average of nominal
data (eye color, hair color, etc…)
Normal Distribution
Frequency,
How often
a score
occurs
Scores
MLB batting
averages over
3-year span
(min. 100 AB)
Mean = 0.267
n = 1291
Normal Distribution
Mode
Median
Mean
Scores
“Normal” distribution
indicates the data are
perfectly symmetrical
Positively skewed distribution
Mode
Median
Mean
Scores
NFL
Salaries
2011
Negatively skewed distribution
Mode
Median
Mean
Scores
Relationship among the MCT &
shape of distribution
Alaska’s average elevation of
1900 feet is less than that of Kansas.
Nothing in that average suggests
the 16 highest mountains in
the United States are in Alaska.
Averages mislead, don’t they?
Grab Bag, Pantagraph, 08/03/2000
Variability
Measures of dispersion or spread
The only thing
constant is variation.
the notions of variability
•Unusual
•Peculiar
•Strange
•Original
•Extreme
•Special
•Unlike
•Deviant
•Dissimilar
•different
NY Times, 10/24/ 2010
Stories vs. Statistics
By JOHN ALLEN PAULOS
Variability defined


Measures of Central Tendency provide a
summary level of the data
Recognizes that scores vary across individual
cases


ie, the mean or median may not be an actual
score in your distribution
Variability quantifies the spread of
performance

How scores vary around mean/mode/median
To describe a distribution

1) Measure of Central Tendency


Mean, Mode, Median
2) Measure of Variability

Multiple measures


Range, Interquartile range, Semi-Interquartile
Range
Standard Deviation
Range


Range = Difference between low/high score
# of hours spent watching TV/week


Range = (Max - Min) Score



2, 5, 7, 7, 8, 8, 10, 12, 12, 15, 17, 20
20 - 2 = 18
Very susceptible to outliers
Doesn’t indicate anything about variability
around the mean/central point
Semi-Interquartile range

What is a quartile??



Interquartile Range = Q3 - Q1



Divide sample into 4 parts of equal size
Q1 , Q2 , Q3 = Quartile Points
Difference between highest and lowest
quartile
SIQR = IQR / 2
Related to the Median…prevents
outliers from overly skewing measure

For ordinal data or skewed interval/ratio
BMD and walking
Quartiles based on miles
walked/week
Krall et al, 1994, Walking is related to
bone density and rates of bone loss.
AJSM, 96:20-26
Notes:
Skewed Distribution?
95th Percentile?
50th Percentile vs Median?
Variation itself is nature's only irreducible essence.
Stephen Jay Gould
Standard Deviation

Most commonly accepted measure of
spread
1.
2.
3.
4.
Compute the deviations of all numbers from
the mean
Square and THEN sum each of the deviations
Divide by the number of deviations
2
Finally, take the square root
( x  X )
n
Standard Deviation







Distribution = 1, 3, 5, 7
X = 16 /4 = 4
1) Compute Deviations = -3, -1, 1, 3
2) Square Deviations = 9, 1, 1, 9
3) Sum Deviations = 20
4) Divide by n= 20/4 = 5
5) Take square root = √5 = 2.2
Key points about SD




SD small  data clustered round mean
SD largedata scattered from the mean
Affected by extreme scores (just like
mean)…oftentimes called “outliers”
Consistent (more stable) across samples from
the same population

Just like the mean - so it works well with
inferential stats (where repeated samples are
taken)
SD Example

Three NFL quarterbacks with similar QB
ratings in 2006:





Matt Hasselbeck (SEA) = 76.0
Rex Grossman (CHI) = 73.9
Brett Favre (GB) = 72.7
Note: QB rating involves a complex formula accounting for passing
attempts, completions, yards, touchdowns, and
interceptions…100+ is considered outstanding & 70-80 is average
All appear to have had very similar,
somewhat mediocre seasons as QB’s
SD Example

Let’s look at the SD of their game-bygame QB ratings:




Matt Hasselbeck (SEA) = 29.97
Rex Grossman (CHI) = 47.60
Brett Favre (GB) = 27.81
Grossman had, by far, the most
variability (i.e. inconsistency) in his
game-by-game performances…is this
good or bad?
Clinical Use of SD
SD and the normal curve


The following concepts are critical to
your understanding of how descriptive
statistics works
Remember – a “normal” curve is
perfectly symmetrical. This is not
typical, but usually data are almost
normal…
SD and the normal curve
X = 70
SD = 10
34.1%
60
70
34.1%
80
About 68% of
scores fall
within 1 SD
of mean
The standard deviation and
the normal curve
About 68% of
scores fall
between 60
and 70
X = 70
SD = 10
34%
60
34%
70
80
The standard deviation and
the normal curve
About 95% of
scores fall
within 2 SD
of mean
X = 70
SD = 10
34.1% 34.1%
13.6%
50
60
13.6%
70
80
90
The standard deviation and
the normal curve
About 95% of
scores fall
between 50
and 90
X = 70
SD = 10
34.1% 34.1%
13.6%
50
60
13.6%
70
80
90
The standard deviation and
the normal curve
About 99.7%
of scores fall
within 3 S.D.
of the mean
X = 70
SD = 10
34.1% 34.1%
13.6%
13.6%
2.3%
40
2.3%
50
60
70
80
90
100
The standard deviation and
the normal curve
About 99.7%
of scores fall
between 40
and 100
X = 70
SD = 10
34.1% 34.1%
13.6%
13.6%
2.3%
40
2.3%
50
60
70
80
90
100
What about X = 70, SD = 5?




What approximate percentage of scores
fall between 65 & 75?
…1SD below + 1SD above = 68%
What range includes about 99.7% of all
scores?
…3SD below to 3SD above = 55 to 85
Interpreting The Normal Table

Area under Normal Curve


Specific SD values (z) include certain
percentages of the scores
Values of Special Interest



1.96 SD = 47.5% of scores (47.5 + 47.5 = 95%)
2.58 SD = 49.5% of scores (49.5 + 49.5 = 99%)
ie, 95% of scores fall within 1.96 standard deviations
of the mean (1.96 above and 1.96 below)
IQ
68% have an IQ
between 85-115
X = 100
SD = 15
34.1% 34.1%
13.6%
13.6%
2.3%
55
2.3%
70
85
100
115
130
145
MLB players’
batting averages
over a 3-year
span (min. 100
at bats)
~95% of players
have an average
between 0.196
and 0.337
Next Week…




We will utilize our understanding of
descriptive statistics concepts, including
central tendency, variability, and the
normal curve, to examine standardized
scores
Homework = Cronk 3.1 – 3.4
Bring calculator to class
In-class activity 2…