Describing Univariate Distributions

Download Report

Transcript Describing Univariate Distributions

Describing Univariate Distributions
Learning Goals
 Use “level of measurement” to decide how to
describe the variable distribution
 Understand frequency distributions
 Understand measures of central tendency
 Understand measures of dispersion, spread,
and variability
Three Little Phrases — Stay Alert!
1. Units of measurement: Standardized and
uniform quantities for expressing an amount
(e.g., feet/inches or metres; miles or kilometres;
years, months, and days; dollars or Euros).
2. Units of analysis: The type of things on which
a variable is defined (“cases” in the data file).
3. Level of measurement: Precision of
measurement — nominal or ordinal categories
(qualitative); interval-ratio (quantitative).
(Named categories; rank order; precise and
meaningful # — either continuous or discrete.)
Exercise: Describing “Our” Variables
for a Class Data File [1]
HEIGHT
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
Describing “Our” Variables [2]
DISTANCE FROM CAMPUS
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
Describing “Our” Variables [3]
TIME OF COMMUTE
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
Describing “Our” Variables [4]
MODE OF TRANSPORTATION
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
Describing “Our” Variables [5]
GENDER
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
Describing “Our” Variables [6]
NUMBER OF SIBLINGS
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
Describing “Our” Variables [7]
EYE COLOUR
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
Describing “Our” Variables [8]
BEER ATTITUDE
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
 Can we make this a Likert response scale ( i.e.,
strongly agree, agree, uhhh, disagree, strongly
disagree)? Why are Likert scales unlikeable?
Describing “Our” Variables [9]
HOW OFTEN DO YOU ATTEND MUSICAL
EVENTS?
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
Describing “Our” Variables [10]
MOVIE FAVES
 What is its level of measurement?
 Is a frequency distribution a good idea?
 What measures of central tendency are best?
 What measures of dispersion/variability are
best?
Frequency Distributions
Used for nominal and ordinal data;
interval-ratio data may need to be grouped.
 Compute counts (frequencies) and relative
frequencies (proportions expressed as %).
 Do not do the cumulative percentages for
nominal data!
 Don’t get whole number variable values (number
of pets) confused with frequencies!
Measures of Central Tendency
 For interval-ratio data: Mean, median, mode.
 For ordinal data: Median and mode.
 For nominal data: Mode.
Mean vs. Median
If the variable distribution is skewed (long tail of
extreme values on ONE side), median may be
preferable.
Mean and median are obtained in VERY
DIFFERENT WAYS.
 Mean: See formula provided by Garner (2010, p.
59).
 Median: See algorithm and formula provided by
Garner (2010, p. 61–62).
Measures of Dispersion/Variability
 Range
 Standard deviation
 Percentile distributions
— e.g., interquartile range
For categoric data: Index of diversity and index of
qualitative variation (optional). See Garner
(2010, pp. 67–69).
How to Compute
the Standard Deviation
Very important!
The algorithm (summarized by the formula)
needs to be memorized.
 See Garner (2010, p. 65).
 Work through a few simple examples.
 Don’t confuse the SD with the mean deviation or
the mean absolute deviation.
Mean and SD of a Proportion
 Proportion for a variable with two categories
(binary or dichotomous).
 Coded as 0/1 for the two categories.
 The mean = number of cases coded 1 divided
by the total number of cases.
 The SD is the square root of [p x (1–p)]
where p is the proportion of cases coded 1.