Introduction of Effect Size

Download Report

Transcript Introduction of Effect Size

Effect Size and Meta-Analysis
Effect size helps evaluate the size of a difference, such as the
difference between two means.
Meta-analysis is used to combine results across diverse
studies on a given topic.
Topic 58: Introduction to Effect Size (d)
Suppose that Experimenter A
administered a new treatment
for Depression (Treatment X)
to an experimental group,
while the control group
received a standard treatment.
Furthermore, suppose that
Experimenter A used a 20 item
true-false depression scale
(with possible raw scores
from 0 to 20) and obtained the
results on the posttest shown
here. Note that the difference
between the two means is 5
raw score points.
Topic 58: Introduction to Effect Size (d)
Suppose that Experimenter B
administered Treatment Y to an
experimental group while treating
the control group with the
standard treatment.
Furthermore, suppose
Experimenter B used a 30-item
scale with choices from “Strongly
agree” to “Strongly disagree” (with
possible scores from 0 to 120) and
obtained the results shown here,
which show a difference of 10 raw
score points in favor of the
experimental group.
Topic 58: Introduction to Effect Size (d)
Which treatment is superior?
Treatment X, which resulted in a 5-point raw
score difference between the two means, or
Treatment Y, which resulted in a 10-point raw
score difference between the two means?
Of course, the answer is not clear because the
two experimenters used different measurement
scales (0 to 20 versus 0 to 120)
Topic 58: Introduction to Effect Size (d)
In Experiment A, one standarddeviation unit equals 4.00 rawscore points. Dividing the
difference between the means
(5.00) by the size of the standarddeviation unit for Experiment A
(4.00 points) yields an answer of
1.25. This value is known as d and
is obtained by applying the formula
to the right, in which me stands for
the mean of the experimental
group, and mc stands for the mean
of the control group.
Topic 58: Introduction to Effect Size (d)
Using the same formula for
Experiment B, the difference
between the means is divided by the
standard deviation (10.00/14.00),
yielding d = 0.71, which is almost
three-quarters of the way above 0.00
on the three-point scale.
The following is what is now known
about the differences in the two
experiments when both are
expressed on a common (i.e.,
standardized) scale called d.
Topic 58: Introduction to Effect Size (d)
Remember that the two raw score
differences are not directly
comparable because different
measurement scales were used
(0 to 20 points versus 0 to 120
points). By examining the
standardized values of d, which
range from 0.00 to 3.00, a
meaningful comparison of the
results of the two experiments
can be made.
Topic 58: Introduction to Effect Size (d)
Important definition: Effect size refers to the magnitude
(i.e., size) of a difference when it is expressed on a
standardized scale. The statistic d is one of the most
popular statistics for describing the effect size of the
difference between two means. In the next topic, the
interpretation of d is discussed in more detail. In topic 60,
an alternative statistic for expressing effect size is
described.
Topic 59: Interpretation of Effect Size (d)
In the previous topic, effect size expressed as d was introduced. The
two examples in that topic had values of d of 0.71 and 1.25. Obviously,
the experiment with a value of 1.25 had a larger effect than the one
with a value of 0.71.
While there are no universally accepted standards for describing values
of d in words, many researchers use Cohen’s suggestions: (1) a value
of d of about 0.20 (one-fifth of a standard deviation) is “small,” (2) a
value of 0.50 (one-half of a standard deviation) is “medium,” and (3) a
value of 0.80 (eight-tenths of a standard deviation) is “large.”
Keep in mind that in terms of values of d, an experimental group can
rarely exceed a control group by more than 3.00 because the effective
range of standard-deviation units is only three on each side of the
mean. Thus, for most practical purposes, 3.00 or -3.00 is the maximum
value of d.
Topic 59: Interpretation of Effect Size (d)
Using the labels in Table 1, the
value of d of 0.71 in the previous
topic would be described as being
closer to “large” than “medium,”
while the value of 1.25 would be
described as being between “very
large” and “extremely large.”
Topic 59: Interpretation of Effect Size (d)
The labels being discussed should not be used
arbitrarily without consideration of the full context in
which the values of d were obtained and the possible
implications of the results. This leads to two
principles: (1) a small effect size might represent an
important result, and (2) a large effect size might
represent an unimportant result.
Topic 60: Effect Size and Correlation (r)
Cohen’s d is so widely used as a measure of effect size that some
researchers use the term “effect size” and “d” interchangeably -- as
though they are synonyms. However, effect size refers to any
statistic that describes the size of a difference on a standardized
metric.
Topic 60: Effect Size and Correlation (r)
In addition to d, a number of other measures of effect size have
been proposed. One that is very widely reported is “effect-size r,”
which is simply the Pearson Correlation Coefficient (r), which was
described in Topic 53. As outlined in that topic, r indicates the
direction and strength of a relationship between two variables
expressed on a scale that ranges from -1.00 to 1.00, where 0.00
indicates no relationship. Values of r are interpreted by first
squaring them (r2).
For example, when r = 0.50, r2 = 0.25 (0.50 x 0.50 = 0.25). Then,
the value of r2 should be multiplied by 100%. Thus, 0.25 x 100% =
25%. This indicates that the value of r of 0.50 is 25% greater than
0.00 on a scale that extends up to a maximum possible value of
1.00.
Topic 60: Effect Size and Correlation (r)
In basic studies, the choice values of d (which can range from -3.00 to 3.00) and
reporting correlation coefficients and the associated values of r2 (which can range
from 0.00 to 1.00) is usually quite straightforward. If a researcher wants to determine
which of two groups is superior on average, a comparison of means using d is usually
the preferred method of analysis.
On the other hand, if there is one group of participants with two scores per participant
and if the goal is to determine the degree of relationship between the two sets of
scores, then r and r2 should be used. For instance, if a vocabulary knowledge test
and a reading comprehension test were administered to a group of students, it would
not be surprising to obtain a correlation coefficient as high as 0.70, which indicates a
substantial degree of relationship between two variables (i.e., there is a strong
tendency for students who score high on vocabulary knowledge to score high on
reading comprehension).
As described in Topic 53, for interpretive purposes, 0.70 squared equals 0.49, which
is equivalent to 49%. Knowing this allows a researcher to say that the relationship
between the two variables is 49% higher than a relationship of 0.00.
Topic 60: Effect Size and Correlation (r)
When reviewing a body of
literature of a given topic,
some studies present means
and values of d while other
studies on the same topic
present values of r, depending
on the specific research
purposes and research
designs. When interpreting
such a set of studies, it can be
useful to think in terms of the
equivalent of d and r. Table 1
shows the equivalents for
selected values.
Topic 61: Intro to Meta-Analysis
 Meta-analysis
is a set of statistical methods for combining the results
of previous studies.
 Meta-analysis provides a statistical method that can synthesize
multiple studies on a given topic.
 The differences in the results of each study contained in the metaanalysis are subject to the many types of errors, such as:
• Random sampling errors
• Random errors of measurement
• Systematic errors known to one or more of the researchers
• Systematic errors of which the researchers are unaware
 The results of any one experiment should be interpreted with caution.
 The main focus of the results in a meta-analysis is based on a
mathematical synthesis of the statistical results of the studies included in
the analysis.
• The synthesis can be gathered by averaging the results of the four
mean differences.
Example: Results of Meta-Analysis of Two Experiments
______________________________________________________________
Experimental Group
Control Group
Mean Difference
________________________________________________________
Researcher
m= 22.00
m= 19.00
mdiff= 3.00
W ________________________________________________________
Researcher
m= 20.00
m= 18.00
mdiff= 2.00
X
________________________________________________________
Researcher
m=23. 00
m= 17.00
mdiff= 6.00
Y
________________________________________________________
Researcher
m= 15.00
m= 16.00
mdiff= -1.00
Z
________________________________________________________
The best estimate of the effectiveness of the program is 2.50 points based on
sample of 400 students.
Two Important Characteristics of Meta- Analysis
1. Statistics based on larger samples yield more reliable results.
 It is important to remember that more reliable results do not
necessarily mean more valid results.
 A systematic bias that skews the results will yield invalid outcomes
no matter how big the sample size is.
2. Meta-analysis typically synthesizes the results of studies
conducted by independent researchers.
 Since the researchers are not working together, if one researcher
makes an error, the effects of his or her erroneous results will be
moderated when they are averaged with the other results.
Topic 62: Meta- Analysis and Effect Size
 In a meta- analysis, it is difficult to find even one perfectly strict replication of a study,
for studies often differ in that various researchers frequently use different measures
of the same variable.
For example, Experimenter A used a test with possible score values from 200-800,
while Experimenter B used a test with possible scores values from 0-50.
________________________________________________________________
Experimental Group.
Control Group
Mean Difference
_________________________________________________________
Exp. A
m= 500.00
m= 400.00
mdifference
N=50
sd= 200.00
sd= 200.00
= 100.00
Exp B
N=50
m= 24.00
sd=3.00
m= 22.00
sd= 3.00
D= divide m difference by the standard deviation (sd)
Exp A: d= 100.0/200.00= .50
Exp B: d= 2.00/3.00= .67 (had a larger effect than Exp A)
mdifference
= 2.00
Topic 62: Meta- Analysis and Effect Size
 In the previous study, the average of the mean difference lacks meaning
because the results are expressed on different scales.
 The answer to this problem is to use a measure of effect size,
 Cohen’s d : expressed on a standardized scale that ranges from -3.00
to + 3.00
 Calculating d for all studies then averaging the values of d allows one to
gather a meaningful result
 Once you gather this information, you can gauge the strength of this
meta- analysis by comparing the results to the Table 1 of Topic 59
 R is also expressed on a standardized scale, -1.00 to +1.00
 R values can also be averaged while weighting the avg. to take into
account varying sample size
**Consumers of research should look to see whether a meta-analysis
is based on weighted averages, which is always desirable.
Topic 63: Meta- Analysis: Strengths and
Weaknesses
 Strengths:
 Produce results based on large combined samples, such large
sample yield very reliable results (may lack validity if metaanalysis contains serious methodological flaws)
 Can be used to synthesize the results of studies conducted by
independent researchers
 Meta- analyses results in objective conclusions (obtain results
mathematically)
 Demonstrates what can be obtained “objectively” which can be
compared and contrasted with more subjective qualitative
literature reviews on the same research topic
Topic 63: Meta- Analysis: Strengths and
Weaknesses
 Weaknesses:
 Researcher may not be careful in selection of studies to include
in a meta- analysis, which will lead to results that are difficult to
interpret or even meaningless
 Moderator variable: variable on which the studies are
divided into subgroups in a study which separate analyses
are conducted for various subgroups
 Moderates the results so that the results for subgroups are
different from the grand combined result
 “Publication bias”
 The body of published research available on a topic for a
meta- analysis might be biased toward studies that have
statistically significant results.