Q 1 - for Dr. Jason P. Turner

Download Report

Transcript Q 1 - for Dr. Jason P. Turner

Measures of Central Tendency
MARE 250
Dr. Jason Turner
Centracidal Tendencies
The measure of central tendency indicates where
along the measurement scale the sample or
population is located – can be determined via
various measures
Three most important:
Mean
Median
Mode
Mean Girls
Mean – most commonly used measure of center
sum of the observations divided by the number of observations
The Median
"As we were driving, we saw a sign that said "Watch for Rocks." Martha said it should read "Watch for
Pretty Rocks." I told her she should write in her suggestion to the highway department, but she started
saying it was a joke - just to get out of writing a simple letter! And I thought I was lazy!“ – Jack Handy
The median is typically defined as the middle
measurement in an ordered set of data
Separates the bottom 50% of the data from the top 50%
The Mode
“Oh, no way - where? Holy crap, he's with a girl! But he's the guy from
Depeche Mode! That's impossible! Come on, he's in Depeche Mode!”
- The Monarch
The mode is typically defined as the most frequently
occurring measurement in a set of data
Number of Individuals
The mode is useful if the distribution is skewed or bimodal (having two
very pronounced values around which data are concentrated)
30
20
10
0
You are so totally skewed!
The mean is sensitive to extreme (very large or small)
observations and the median is not
Therefore – you can determine how skewed your data is by
looking at the relationship between median and mean
Mean is Greater
than the Median
Mean and Median
are Equal
Mean is Less Than
the Median
Resistance Measures
A resistance measure is not sensitive to the influences of
a few extreme observations
Median – resistant measure of center
Mean – not
Resistance of Mean can be improved by using –
Trimmed Means – a specified percentage of the smallest
and largest observations are removed before computing the
mean
Will do something like this later when exploring the data
and evaluating outliers…(their effects upon the mean)
How To on Computer
On Minitab:
Your data must be in a single column
Go to the 'Stat' menu, and select 'Basic stats', then 'Display descriptive stats'.
Select your data column in the 'variables' box.
The output will generally go to the session window, or if you select 'graphical
summary' in the 'graphs' options, it will be given in a separate window.
This will give you a number of basic descriptive stats, though not the mode.
Measures of
Dispersion and
Variability
MARE 250
Dr. Jason Turner
Please Disperse!
“Alright everyone, disperse immediately. We are prepared to use force a-- what, what?
We're not prepared, Eddie? Someone call 911!” – Chief Wiggum
Measure of Dispersion of the Data - an indication
of the spread of measurements around the center
of the distribution
2 of the most frequently used –
Range
Standard Deviation
The Range
Range - the difference between the highest and
lowest values in the observations
This is useful, but may be misleading when the
data has one or more outliers (single
measurements that are exceptionally large or small
relative to the other data)
It is not relative to the central location
Range = Max - Min
The Variance
Variance - the average of the squared deviations
from the mean
The most widely used measure of spread, and
one that will be used often in various statistical
applications
The Variance
Degrees of Freedom - quantity (n -1)
Used instead of n to provide an unbiased
estimate of the population variance
As the sample size (n) increases
(and n approaches N)
Value of the population and sample variance
will become more similar
Standard Deviation
Standard Deviation – the positive square root of
the variance
Indicates how far (on average) the observations in
the sample are from the mean of the sample
The more variation in a data set, the larger its
standard deviation
Quartiles
Median divides data into 2 equal parts: 50% bottom, 50% top
Quartiles – into quarters – 4 equal parts
A dataset has 3 quartiles:
Q1 – is the number that divides the bottom 25% from
top 75%
Q2 – is the median; bottom 50% from top 50%
Q3 – is the number that divides the bottom 75% from
top 25%
Quartiles
Interquartile Range
Interquartile Range (IQR) – the difference
between the first and third quartiles
IQR = Q3 – Q1
The IQR gives you the range of the middle
50% of the data
Outlier, Outlier
Outliers – observations that fall well outside
the overall pattern of the data
Requires special attention
May be the result of:
Measurement or Recording Error
Observation from a different population
Unusual Extreme observation
Pants on Fire
Must deal with outliers: (Yes, really!)
If error – can delete; otherwise judgment call
Can use quartiles and IQR to identify potential
outliers
The Outer Limits
Lower and Upper Limits:
Lower limit – is the number that lies 1.5
IQR’s below the first quartile
Lower Limit = Q1 - 1.5 * IQR
Upper limit – is the number that lies 1.5
IQR’s above the first quartile
Upper Limit = Q3 + 1.5 * IQR
The Outer Limits
If a value is outside the “Outer Limits” of
a dataset it is an
Five-Number Summary
5-Number Summary:
Min, Q1, Q2, Q3, Max
Written in increasing order
Provides information on Center and Variation
Are used to construct Box-Plots
Boxplots
Boxplot (Box-and-Whisker-Design):
based on the 5-number summary
provide graphic display of the center and variation
Q1 Q2 Q3
Min
Max
0
70
Boxplots
Modified Boxplot – includes outliers
Potential Outlier
*
0
70
Note that Min & Max are determine after
outliers are removed!
Boxplots
Boxplots
Boxplots summarize information about the shape,
dispersion, and center of your data. They can also help
you spot outliers.
The left edge of the box represents the first quartile (Q1), while
the right edge represents the third quartile (Q3). Thus the box
portion of the plot represents the interquartile range (IQR), or
the middle 50% of the observations
Q1 Q2 Q3
Min
Max
0
70
Boxplots
The line drawn through the box represents the median of the data
The lines extending from the box are called whiskers. The
whiskers extend outward to indicate the lowest and highest
values in the data set (excluding outliers)
Extreme values, or outliers, are represented by dots. A value is
considered an outlier if it is outside of the box (greater than Q3 or
less than Q1) by more than 1.5 times the IQR
Potential Outlier
*
0
70
Boxplots
Use the boxplot to assess the symmetry of the data:
If the data are fairly symmetric, the median line will
be roughly in the middle of the IQR box and the
whiskers will be similar in length
If the data are skewed, the median may not fall in the
middle of the IQR box, and one whisker will likely be
noticeably longer than the other