Types of data and how to present them - 47-269-203-spr2010
Download
Report
Transcript Types of data and how to present them - 47-269-203-spr2010
Types of data and how to
present them
47:269: Research Methods I
Dr. Leonard
March 31, 2010
Scientific Theory
1. Formulate theories
2. Develop testable hypotheses (operational definitions)
3. Conduct research, gather data
4. Evaluate hypotheses based on data
5. Cautiously draw conclusions
Scales of Measurement
Nominal
Categories
Ordinal
Categories that can be ranked
Interval
Scores with equidistant
intervals between them
Ratio
Scores with equidistant
intervals and absolute zero
Nominal
Responses Responses
are distinct
can be
ranked
YES
NO
Equal
intervals
Absolute
zero
NO
NO
Ordinal
YES
YES
NO
NO
Interval
YES
YES
YES
NO
Ratio
YES
YES
YES
YES
Two major approaches to using data
Descriptive statistics
Describe or summarize data to characterize sample
Organizes responses to show trends in data
Inferential statistics
Draw inferences about population from sample (is
population distinct from sample?)
Significance
Capture
impact of random error on responses
Margin
Note:
tests
of error
Statistics describe responses from a sample;
parameters describe responses from a population (e.g.,
a census)
Descriptive Statistics
N,
total number of cases (responses) in a sample
Our class would be N = 33
f, or frequency, is the number of participants who gave
a particular response, x
Can
Can
also be given as percentages or proportions
be univariate or bivariate
How
participants vary on one variable (uni-)
How participants vary on two variables (bi-)
Descriptive
statistics are a good first step for
analyzing any data!
They
are the only statistics appropriate for nominal data
Frequency distribution (nominal data)
x (response)
f (frequency)
%
Democrat
479
47.9
Republican
411
41.1
Independent
101
10.1
Green party
9
0.9
Total
n = 1,000
100%
Frequency distribution (interval or ratio data)
When you need to present a wide range of scores, show responses
grouped in intervals to make it easier to grasp “big picture” of data
2.7 1.9
3.1
1.0
3.3 1.3
2.2 3.0 3.4 3.1
1.8
2.6 3.7
2.2 1.9
3.1
3.4 3.0 3.5 3.0 2.4 3.0 3.4 2.4
2.4 3.2 3.3 2.7 3.5 3.2 3.1
2.1
1.5
1.4
2.6 2.9 2.1
2.3 3.1
3.3
2.7 2.4 3.4 3.3 3.0 3.8
1.6
2.8 3.8 1.4
2.6 1.5
2.8 2.3
2.8 2.3 2.8 3.2 2.8
1.9
3.3 2.9 2.0 3.2
Interval
.90 - 1.1
1.2 - 1.4
1.5 - 1.7
1.8 - 2.0
2.1 - 2.3
2.4 - 2.6
2.7 - 2.9
3.0 - 3.2
3.3 - 3.5
3.6 - 3.8
f
1
3
3
5
6
7
10
14
12
3
Frequency
distributions can be depicted graphically in…
Bar graphs
Bars not touching because of
discrete data
Nominal and ordinal data
Histograms
Bars touching because of
continuous data
Interval and ratio data
Frequency polygons (single line)
Interval and ratio data
Shapes of Distributions
_
normal
_
positive skew
_
negative skew
X
X
X
Shapes of Distributions
_
normal
_
platykurtic
_
leptokurtic
X
X
X
What else can we do besides frequencies?
Measures of central tendency show the central or “typical” scores
in a distribution
Mean- the average score
Median- the middle score
Mode- the most frequent
score
The mean, median, and mode are related to the horizontal shape
(skew) of the distribution.
In
In
In
a normal distribution: Mean = Median = Mode
a positively skewed distribution: Mode < Median < Mean
a negatively skewed distribution: Mean < Median < Mode
Which measure of central tendency???
Different measures of central tendency are appropriate
depending upon the level of measurement used:
Nominal
Mode
Ordinal
Mode
Median
Interval/Ratio
Mode
Median
Mean
The Mean
2
The most informative and elegant measure of
central tendency.
The average
The fulcrum point of the distribution
4
6
8
10
2
4
6
8
15
The Median
The
middle most score in a distribution.
The scale value below which and above which 50%
of the distribution falls
Not the fulcrum: The halfway point
2
4
6
8
10
2
4
6
8
15
If
2
The Median
N is odd, then median is the center score
4
6
8
2
10
4
6
8
15
If
N is even, then median is the average of the two
centermost score
2
4
6
8
10
12
2
4
6
8
10
15
The Median
If
the median occurs at a value where
there are tied scores, use the tied
score as the median
10
2
4
6
8
10
8
10
15
The
The Mode
most frequent score in the distribution
10
2
2
4
4
6
6
8
10
8
10
8
10
8
10
15
15
One more thing…
These measures of central tendency vary in their sampling
stability = match between the sample mean (e.g., x) and the
population mean (μ).
Mode
Least sampling
stability
•
Median
Mean
Most sampling
stability
Note: Roman (r, s, x) characters are used for sample statistics
while Greek (, , ) characters are used for population statistics.
Review of central tendency
Which one is the only appropriate measure for nominal data?
The mode
How do you find the median when there is an odd number of scores?
Simply locate the score in the middle
…when there is an even number of scores?
Average the two middle scores
Which measure is most sensitive to extreme scores and why?
The mean because it takes all scores into account and can be swayed
by positive or negative skew
Which measure has the most sampling stability and why?
The mean because it is the most accurate representation of the
overall sample
Application of central tendency
In
2006, the median home price in Boston was
$386,300. (San Francisco was $518,400; Washington
D.C was $258,700).
How
Why
do you interpret these numbers?
are housing prices framed in terms of the
median rather than the mean or the mode?
Measures of variability
Measures
of central tendency
…indicate the typical scores in a distribution
…are related to skew (horizontal)
Measures
of variability
…show the dispersion of scores in a distribution
…are related to kurtosis (vertical)
Measures of variability
Range
- the difference between the highest
and lowest score
Variance
- the total variation (distance) from
the mean of all the scores
Standard
deviation - the average variation
(distance) from the mean of all the scores
Measures of variability
Range = Highest Score – Lowest Score
2
4
6
8
2
4
6
8
10
15
Most sensitive to extreme scores!
Measures of variability
Again,
variance is the overall distance from the
mean of all scores (requires squaring the distance
of each score from the mean)
Not
as useful as the standard deviation -- the
average distance scores fall from the mean
Measures of variability
Standard
deviation, like the mean, is the most
informative and elegant measure of variability.
The average distance of scores from the mean score
-- deviation is distance!
2
Also
4
6
8
10
like the mean, standard deviation has the most
sampling stability
How would these standard deviations differ?
2
Mean = 6
Mean = 7.9
2
4
4
6
8
6
8
10
6
8
10
10
Range = 8
Range = 10
12
Standard deviation and shape of distribution
5
0
1
1
4
1
1
4
0
4
1
1
5
5
2
1
0
5
2
1
5
6
3
1
Mean = 15
0
Mean = 15
Std. Dev. = 10
6
Mean = 15
Std. Dev. = 0.9
Properties of Normal Distributions
• All normal distributions are single peaked, symmetric, and
bell-shaped
• Normal distributions can have different values for mean and
standard deviation but…
• All normal distributions follow the 68-95-99 rule
68.3% of data within 1 standard deviation of the mean
95.4% of data within 2 standard deviations of the mean
99.7% of data within 3 standard deviations of the mean
99.7% - 95.4%
- 68.3% - 95.4% - 99.7%
Mean