Descriptive Data Analysis - Gail Johnson`s Research Demystified

Download Report

Transcript Descriptive Data Analysis - Gail Johnson`s Research Demystified

Data Analysis for Description
Research Methods for Public
Administrators
Dr. Gail Johnson
Dr. G. Johnson,
www.ResearchDemystified.org
1
Simple But Concrete
 The Children’s Defense Fund reports on
each day in America:
 Four
children are killed by abuse or neglect
 Five children or teens commit suicide
 Eight children or teens are killed by firearms
 Seventy-five babies die before their 1st birthday
㹈 http://www.childrensdefense.org/child-research-data-publications/each-day-inamerica.html
Dr. G. Johnson,
www.ResearchDemystified.org
2
Simple But Concrete
 A million seconds = 11 ½ days
 A billion seconds= 32 years
 A trillion seconds= 32,000 years
Dr. G. Johnson,
www.ResearchDemystified.org
3
Simple But Concrete
 A $700 billion bailout translates into $2,333 IOU
from every person in the U.S.
 Or—using a different metric-it comes to $45 per
week for each person in the U.S.
 Going one step further, it comes out to $6 a day
 Framing: are you willing to pay $6 a day to have a
functioning financial system?
Read more:
http://www.time.com/time/business/article/0,8599,1870699,00.html#ixzz0aqek
0mRZ
Dr. G. Johnson,
www.ResearchDemystified.org
4
Going Too Far?
 Six dollars a day is also 25 cents an hour, or less
than half a penny a minute.
 Framing: Would you be willing to pay less than
half a penny a minute?
 Key Point: Does the comparison point make a
difference in what you would be willing to pay?

Read more:
http://www.time.com/time/business/article/0,8599,1870699,00.html#ixzz0aqf9
HSQ9
Dr. G. Johnson,
www.ResearchDemystified.org
5
Common Descriptive Analysis
 Counts: how many
 Decennial census
 Percents
 Women earned 77% of what men earned in
2006, up from 59% in 1970
 Parts of a whole
 Percents
(75%) and proportions (.75 or three-
quarters)
Dr. G. Johnson,
www.ResearchDemystified.org
6
Common Descriptive Analysis
 But be mindful of “bigger pie” distortions when
working with percents and proportions


If the pie grows much faster than the slice, the slice will
appear relatively smaller as a percent even though it
still grew
Best example is budget deficit as a percent of the GDP:
if GDP grows much faster than the budget deficit, it
will appear smaller even though it has also grown.
Dr. G. Johnson,
www.ResearchDemystified.org
7
Common Descriptive Analysis
 Rates: number of occurrences that are
standardized
 Deaths of infants per 100,000 births
 Crop yields per acre
 Crime rates
 Rates provide an apples-to-apples comparison
between places of different size or populations
Dr. G. Johnson,
www.ResearchDemystified.org
8
Common Descriptive Analysis
 Ratio: numbers presented in relationship to
each other
 Student
to teacher ratio: 15:1
 Divide number of students by the number of
teachers
 1,500 students and 45 teachers equals a 33 to 1
student to teacher ratio (1,500 divided by 45)
Dr. G. Johnson,
www.ResearchDemystified.org
9
Common Descriptive Analysis

Rates of change

Percentage change from one time period to
the other

For example: The budget increased 23% from FY
2006 to FY 2007.
Three Steps:
1.
2.
3.
Divided newest data by oldest data
Subtract 1
Multiple by 100 to get the percentage change
Dr. G. Johnson,
www.ResearchDemystified.org
10
Common Descriptive Analysis

Rates of change

Percentage change from one time period to
the other

For example: The budget increased 23% from FY
2006 to FY 2007.
Three Steps:
1. Divided newest data by oldest data
2. Subtract 1
3. Multiple by 100 to get the percentage change
Dr. G. Johnson,
www.ResearchDemystified.org
11
Common Descriptive Analysis


Rates of change: applied
What was the rate of change in 1992 budget
deficit as compared to 1980.
1.
2.
3.

Divide 1992 budget deficit ($290 billion) by the 1980
budget deficit ($73.8 billion) = 3.93
3.93-1 – 2.93
2.93 x 100 = 293 percent
The budget deficit in current dollars (meaning not
controlled for by inflation) increased 293 percent.
Dr. G. Johnson,
www.ResearchDemystified.org
12
Common Descriptive Analysis
 Frequency Distributions
 Number and percents of a single variable
Dr. G. Johnson,
www.ResearchDemystified.org
13
In The News: Women Now Are Majority
of College Graduates
Dr. G. Johnson,
www.ResearchDemystified.org
14
Interpretation?
 How would you interpret these percentages
in the comparative trend analysis?
 Are you surprised by the changes over
time?
 Why or why not?
Dr. G. Johnson,
www.ResearchDemystified.org
15
Frequency and Percent
Distributions
 Survey data: analyzed by distributions
 How many men and women are in the program?
Distribution of Respondents by Gender:
Male
Number Percent
100
33%
Female
Number Percent
200
67%
Dr. G. Johnson,
www.ResearchDemystified.org
Total
Number
300
16
Frequency and Percent
Distributions
 How many men and women are in the
program?
Write-up:
Of the 300 people in this program, 67% are
women and 33% are men.
Dr. G. Johnson,
www.ResearchDemystified.org
17
Different Analysis Tools For
Different Situations
 Frequency/percent distributions make sense when
working with nominal and ordinal data
 But frequency/percent distributions for
interval/ratio data can result in a ridiculously long
table that is impossible to interpret


If I ask 500 people how many years they lived in an
area, I can can get a wide range of answers.
For this type of data, I would then look at means,
medians, modes to describe that variable.
Dr. G. Johnson,
www.ResearchDemystified.org
18
Describing Distributions
 Central tendency
 Means, Medians, Modes
 How similar are the characteristics?
 Example: Use when we want to describe the
similarity of the ages of a group of people.
 Dispersion
 Range,

standard deviation
How dissimilar are the characteristics?
 Example:
how much variation in the ages?
Dr. G. Johnson,
www.ResearchDemystified.org
19
Measures of Central Tendency
 The 3-Ms:
Mode:
Median:
Mean:
Mode, Median, Mode.
most frequent response.
mid-point of the distribution
arithmetic average.
Dr. G. Johnson,
www.ResearchDemystified.org
20
Basic Concepts Revisited
 Levels of Measurement
 Nominal Level Data: names, categories


Ordinal Level Data: data with an order, going from low
to high


Eg. Highest educational degree, income categories, agree—
disagree scales
Interval Level Data: numbers but no zero


Eg. Gender, religion, state, country
Eg. IQ scores, GRE scores
Ratio Level Data: real numbers with a zero point

Eg. Age, weight, income, temperature
Dr. G. Johnson,
www.ResearchDemystified.org
21
Which Measure of Central
Tendency to Use?
Depends on the type of data you have:
 Nominal data:
mode
 Ordinal data:
mode and median
 Interval/ratio:
mode, median and
mean
Dr. G. Johnson,
www.ResearchDemystified.org
22
For Interval Or Ratio Data:
Which One To Use?
 Concept of the Normal Distribution—also
called the bell-shape curve
 In
a normal distribution, the mean, median and
mode should be very similar
 Use mean if distribution is normal
 Use median if distribution is not normal
Dr. G. Johnson,
www.ResearchDemystified.org
23
Normal Distribution:
Bell-Shaped Curve
Mean
http://en.wikipedia.org/wiki/Normal_distribution
Dr. G. Johnson,
www.ResearchDemystified.org
24
Office contributions
 $10, $ 1, $.50, $.25, $.25.
 The mean is $2.40 (add up and divide by 5)
 The median is .50 (the mid-point of this
distribution)
 The mode is .25 (the most frequently
reported contribution)
 Best description of contributions is median.
Dr. G. Johnson,
www.ResearchDemystified.org
25
Salaries
 Assume that you had 11 teachers. 10
teachers earned $21,000 per year and one
earned $1,000,000.
 What would be the best measure to describe
this data?
Dr. G. Johnson,
www.ResearchDemystified.org
26
Salaries
 The average salary would be $110,000.
 The median and mode is $21,000.
 The curve would be positively skewed, i.e.
Mean higher than Mode and Median
 The median would do the best job at
describing the center the salaries
Dr. G. Johnson,
www.ResearchDemystified.org
27
Skewed Data
1.
2.
negative skew: The mass of the distribution is
concentrated on the right of the figure. It has
relatively few low values. The distribution is
said to be left-skewed.
positive skew: The mass of the distribution is
concentrated on the left of the figure. It has
relatively few high values. The distribution is
said to be right-skewed. The $ million salary
pulls the average up.
Wikipedia: http://en.wikipedia.org/wiki/Skewness
Dr. G. Johnson,
www.ResearchDemystified.org
28
Skewed Distributions:
Negative and Positive
http://en.wikipedia.org/wiki/File:Skewness_Statistics.svg
Dr. G. Johnson,
www.ResearchDemystified.org
29
Using Means With Survey Data?
 Survey data is typically coded using numbers:
 Gender: Male is coded 1

Female is coded 2

It is faster and less error-prone to code variables using
numbers
 But the computer could treat these as numbers and
will compute a mean if asked

How would you interpret a mean for gender of 1.6? Or
a mean for religion of 2.8
Dr. G. Johnson,
www.ResearchDemystified.org
30
Do Not Use Means With
Nominal Data
 Gender (and religion) are nominal variables
and should only be reported in terms of
distributions:

Frequency distribution: 10 men and 12 women
 Percentage distribution: 45% men and 55%
women
Dr. G. Johnson,
www.ResearchDemystified.org
31
Using Means With Survey Data?
 Scales (very satisfied<->very dissatisfied are
ordinal scales


But they coded into the computer using numbers
5 for very satisfied<->1 for very dissatisfied
 The computer will compute a mean if asked:
 The mean was 3.8 for job satisfaction.
 The mean satisfaction with faculty performance was 4.2
on a scale from 1-5
 Grade-point averages are an example of means based
on an ordinal scale (A—F (scale of 0-4)
Dr. G. Johnson,
www.ResearchDemystified.org
32
Using Means With Ordinal Data?
 There is disagreement in the field—partly based on
academic discipline-about whether to use means with
ordinal data.
 Things like GPA or faculty ratings are often shown as
means
 It is often helpful for researchers to look at the means
initially when working with a lot of data—researchers are
looking for unusually high or low means.
 It is also true that sometimes it is easier to show the means
than the percentage distribution for every variable
Dr. G. Johnson,
www.ResearchDemystified.org
33
Washington Employee Survey
Question
2006
2007 2009
I know what is
expected of me at
work
I receive recognition
for a job well done.
4.28
4.25
4.31
Percent
reporting 4 or
5 (positive)
87%
3.34
3.43
3.47
54%
I have the tools and
3.76
resources I need to do
my job effectively.
3.75
3.80
70%
Using Means With Ordinal Data?
 But most people are more familiar with polling
results, which report percent distributions.

We tend to see something like 55% report supporting
cap and trade legislation rather than a mean of 3.4 on a
scale of 5 (for) to 1 (against).
 The decision about whether means or percent
distributions are used to report ordinal data should
reflect audience preference and ease of audience
understanding.

Not an ideological stance
Dr. G. Johnson,
www.ResearchDemystified.org
35
Measures of Dispersion
 Used with Interval and Ratio Data
 Simple Description: The Range
 Reported salaries ranged from $21,000 to $1,000,000
 Ages in the group ranged from 18 to 32
 Standard Deviation
 Measures the dispersion in terms of the the distance
from the mean
 Small standard deviation: not much dispersion
 Large standard deviation: lots of dispersion
Dr. G. Johnson,
www.ResearchDemystified.org
36
Standard Deviation
 Normal Distribution: Bell-shaped curve
 68%
of the variation is within 1 standard
deviation of the mean
 95% of the variation is within 2 standard
deviations of the mean
Dr. G. Johnson,
www.ResearchDemystified.org
37
Normal Distribution
95% of the distribution
Standard deviations
Mean
Standard deviations
Applying the
Standard Deviation
 Average test score= 60.
 The standard deviation is 10.
 Therefore, 95% of the scores are
between 40 and 80.
 Calculation:

60+20=80
60-20=40.
Dr. G. Johnson,
www.ResearchDemystified.org
39
Standard Deviation with Means
 The Standard Deviation is used with interval/ratio
level data
 Typically, standard deviations are presented with
means so the reader can tell whether there is a lot
or a little variation in the distribution.
 Note: the standard deviation is sometimes used in
other statistical calculations, such as z-scores and
confidence intervals
Dr. G. Johnson,
www.ResearchDemystified.org
40
Describing Two Variables
Simultaneously
 Cross-tabulations (cross tabs, contingency
tables)
 Used when working with nominal and
ordinal data
 It provides great detail
Dr. G. Johnson,
www.ResearchDemystified.org
41
Describing Two Variables
Simultaneously
Detail about the race and gender of the 233
people in the workplace:
Race
White
Black
Other
Men
21%
15%
14%
Women
31%
11%
6%
Dr. G. Johnson,
www.ResearchDemystified.org
42
Describing Race and Gender
 Write-up:
Of the 233 employees, the greatest
proportion are white women (31%)
followed by white men (21%). Fifteen
percent of the employees are black men and
11% are black women, and 14% are men of
other race identity and 6% are women of
other race identity.
Dr. G. Johnson,
www.ResearchDemystified.org
43
Describing Two Variables
Simultaneously
Comparison of Means
 Used when one variable is nominal or ordinal,
and the second variable is interval/ration level
of measurement.
 Examples:
Men in the MPA program have a GPA of 3.2 as
compared to 3.0 for women.
 The mean overall citizen satisfaction score is 4.2 this
year as compared to 3.5 last year.
 Mean salary for women was $35,000 as compared to
$38,000 for men last year.

Dr. G. Johnson,
www.ResearchDemystified.org
44
Key Points
 These simple descriptive analysis techniques can
be effective:

Illuminates, provides feedback, informs and might
persuade.
 The math is generally straight-forward.
 Descriptive data is generally easy for many people
understand as compared to more complex statistics
(stay tuned).
 Complex statistics are not inherently better!
Dr. G. Johnson,
www.ResearchDemystified.org
45
The Tough Question
 If descriptive data is distorted, it is tends to be in
the way things are being counted and measured.
 The math is usually correct.

Example: The federal debt is often presented just in
terms of percent of debt held by the public but the total
debt includes money borrowed from other government
funds.
 As a result, the debt looks smaller than what it
actually is.
Dr. G. Johnson,
www.ResearchDemystified.org
46
The Tough Question
 If descriptive data is distorted, it is tends to
be in the way things are being counted and
measured. The math is usually correct
 Example.
Health insurance profits look
different when calculated as a percent of
corporate revenue than when calculated as a
percent of all spending on health care.
 It
will look smaller when presented as a percent of
all health care spending which is larger than just
corporate insurance revenue.
Dr. G. Johnson,
www.ResearchDemystified.org
47
The Tough Question
 Always ask: what exactly is being
measured and counted?
 Consider whether there are other ways of
counting and other ways of doing the
analysis that might yield different results (or
create different perceptions).
 Do the choices reflect a political agenda?
Dr. G. Johnson,
www.ResearchDemystified.org
48
Creative Commons
 This powerpoint is meant to be used and
shared with attribution
 Please provide feedback
 If you make changes, please share freely
and send me a copy of changes:
 [email protected]
 Visit www.creativecommons.org for more
information