The Data Analysis Plan - Web Hosting at UMass Amherst

Download Report

Transcript The Data Analysis Plan - Web Hosting at UMass Amherst

The Data Analysis Plan
The Overall Data Analysis Plan
Purpose: To tell a story.
To construct a coherent narrative that
explains findings, argues against
other interpretations, and supports
conclusions.
Three Steps of the Data Analysis Plan
1) Getting to know the data- A first step
is to examine the data set, the “raw
numbers”. “Play” with the raw
numbers.
2) Summarize the data- Use descriptive
statistics to “summarize” the data.
3) Confirm what the data reveal- Most
commonly, using null hypothesis
significance testing, NHST.
Step 1: Getting to know the data
1) Look at raw numbers and check for
errors and outliers.
• Errors are impossible numbers
(outside the possible range).
• Outliers are in the possible range, but
exceptional. Could be an error or a
true score from an unusual participant.
Decision Rule for Errors and Outliers
• You “fix” errors if you can
• You eliminate outliers (if appropriate).
Follow the rules of the journal or
organization where the results will be
presented/reported.
• Either way, you must specify the amount
of data eliminated and your reason or
“rule” for elimination.
2)Look at a “picture” of raw numbers
• Stem & Leaf Plots
• Histogram (frequency distribution)
• Examine underlying distribution of
raw scores looking for “unusual”
distribution (other than “normal”)
“Normal” Distribution
Skewed Distributions
• Skew Distribution: If extremely skewed,
you may have to transform the scores
(For example, using logarithms or
changing the scale you use)
Positive Skew- tail trails off to the
“positive” side
Negative Skew: Tail trails off to the
“negative” side
“Bi-Modal Distribution”
(or multi-modal)
Very problematic for further analysis- refer
to “experts” for appropriate data analysis.
Step two: Summarize the Data
(Descriptive Statistics)
• Purpose – to describe the data
To indicate what is a typical score (central
tendency)
To asses the degree to which the scores in the
data set differ from one and another
(variability or dispersion)
1) Measures of Central Tendency (tendency
toward the middle) typical score
• Mode- Most frequently occurring score
Example: 2, 4, 5, 5, 6, 8, 9, 10, 10, 10, 12
Mode=
• Median – the “middle” score (50% of scores
below and 50% of scores above)
Median (from above) =
• Mean - Arithmetic average or mean (sum of
scores divided by number of scores)
Example: 2, 4, 5, 5, 6, 8, 9, 10, 10, 10, 12
Mean= 7.363
In a “Normal Distribution”:
Mean=Median=Mode
Mean=Median=Mode
Skewed distribution
If a distribution is “Skew”, the mean may not
be the best descriptor of the typical score.
In this case, the median is a better estimate of
“typical score”.
Usually report BOTH mean and median if the
distribution is skewed.
2) Measures of Variability (dispersion, how
different the numbers are from each other)
• Range – Officially= (highest score – lowest
score)
Example: 2, 4, 5, 5, 6, 8, 9, 10, 10, 10, 12
Range=
Usually reported by citing the lowest and
highest score in the data set.
• Variance –The sum of squared deviations of
the scores around the mean divided by either:
“N” or “n-1” ???
Variance of a set of #’s or the population
variance (the sum of squares (SS) divided by
N):
An estimate of the variance of a population
based on a sample (the sum of the squares (SS)
divided by n-1):
• Standard Deviation –
The square root of the variance.
Effect Size or Effect Magnitude
• An index of the strength of the relationship
between the IV and the DV that is independent
of sample size.
• How large an effect does the IV have on the
DV?
• Cohen’s d is one measure of “size of effect” or
effect magnitude.
• For d, a value of .20 indicates a small
magnitude effect, .50 a medium magnitude
effect, and .80 a large magnitude of effect.
• d is a ratio of the difference between the
means at two levels of an IV divided by the
standard deviation of the population. (the
difference between means divided by a
measure of variability or dispersion)
• As variability increases (standard deviation
increases), d decreases (lower effect size).
Example
• suppose you have two levels of an IV and the
means for these two levels are 8 and 5
• The difference between the two means is
8-5=3
• If there moderate variability in the DV (say
population standard deviation=6), then:
d=3/6 = .50
a medium effect size or magnitude
• If the variability of DV is larger (say
population standard deviation=15), then:
d=3/15=1/5= .20
a small effect size or magnitude
• If the variability in DV is really small
(standard deviation=3.75) then:
d=3/3.75=.80
a large effect size or magnitude
• Effect size is one measure that affects the
“power” of a statistical analysis and it is used
in making decisions about how large a sample
size should be used in order to be sufficient to
produce a reasonable level of “power”.
• Because the standard deviation is used as
the “denominator” for this measure, it is
independent of (not affected by) sample size.
Thus, you can compare effect sizes across
research studies using various sample sizes.
This type of comparison is called a “Metaanalysis”