File - Chelsea J. Hutto Valdosta State University

Download Report

Transcript File - Chelsea J. Hutto Valdosta State University

Ten Difference Score
Myths By Jeffery R.
Edwards
Presented by Chelsea Hutto
Difference Scores
Typically used to represent the similarity between two
constructs
Highly used in studies of person-job fit, similarity
between employee and organizational values, match
between employee expectations and experiences, and
the agreement between performance ratings.
Suffer from many methodological problems
Polynomial Regression
Analysis
These problems can be avoided by using PRA
Uses components of difference scores along with
higher order terms to represent relationships of interest
in congruence research.
Treats difference scores as statements of hypotheses to
be tested empirically.
Also supported by Cafri et. al (2009).
Misconceptions Regarding Problems
with DS
Myth 1:The Problem with Difference Scores is low reliability
Low internal consistency reliability as been viewed as only
serious problem with DS
Reliability of any measure is ultimately an empirical matter
Problem is not whether DS are reliable in an absolute sense
but also whether or not they are more reliable than other
alternatives
Even with adequate reliabilities – does not solve other issues
Myth 2: Difference Scores Provide
Conservative Statistical Tests
Stat’s tests based on DS labeled as conservative
Sometimes seen as appropriate for exploratory research
DS are also likely to invite conclusions that signify false
positives, such that stats tests effectively become liberal.
Have not been scrutinized by PRA
Conservatism usually corresponds to effect sizes that are
biased downward and Type 1 error rates that are minimized
at the cost of Type 2 error.
Need a balance between liberal and conservative
Alternatives to DS that are themselves
problematic
Myth 3:Measures that elicit direct comparisons avoid
problems with difference scores
Merely shift the responsibility of creating a DS from
the researcher to the respondent – must calculate
response – error
Direct comparisons is double barreled, combines two
distinct concepts into a single score
Construct validity of direct comparisons - questionable
Myth 4: Categorized Comparisons
Avoid Problems with DS
Creation of subgroups based on the congruence
between two component measures to avoid problems
with DS
Some researchers even say it could solve reliability issues
Creates illusion
Accentuates the loss of information and reduction in
explained variance
Just makes things worse
Myth 5: Product Terms are Viable
Substitutes for DS
Some turn to product terms tested hierarchically in
multiple regression analysis as last resort
Captures the interaction between two variables
Does not represent the effects of congruence for
continuous measures
Myth 6:Hierarchical Analysis Provides
Conservative Tests of DS
Some studies statistically control for component
measures before estimating the effects of DS.
Characterized as conservative
Components are controlled when testing interactions
using product terms
Does not yield conservative tests of DS, instead alters
the relationships DS are intended to capture
Misunderstandings or misguided
criticisms of PRA
Myth 7: PRA is an Exploratory, Empirically-Driven Procedure
Claimed that PR capitalizes on sample specific variance to
maximize the amount of variance explained
Primary goal of PRA is to test hypotheses derived from theories of
congruence
Also provides an explicit test of this hypothesis whereas using an
algebraic difference score incorporates this hypothesis as an
untested assumption
DS allow congruence hypotheses to evade empirical scrutiny
Lack of evidence necessary to confirm or reject hypotheses.
Myth 8: Polynomial
Regression Suffers from
Multicollinearity
Myth 9: Higher-Order Terms
Do Not Enhance the
Understanding of
Congruence
Concerns of
multicollinearity between
lower order and higher
order terms are
unfounded
Interpretation of higher
order terms can be difficult,
such difficulties arise from
attempts to interpret
coefficients on higher order
terms individually.
Can be avoided by using
response surfaces as the
intermediary between
congruence hypotheses and
PR coefficients
Myth 10: PR Eliminates the Concept
of Congruence
Comes from the assumption that a DS represents a
concept that is distinct from its components. Argued
that DS and their component measures are not
conceptually interchangeable.
DS is calculated from its components it cannot
represent a construct that is conceptually or
operationally distinct from its components.
Assumptions
All can be tested empirically – so why argue?
PRA has its limitations
More comprehensive and conclusive that information
obtained from difference scores
Things I Have Learned (So Far)
by Jacob Cohen
Some Things You Learn
Aren’t So
Proper sample size of 30 cases per group when
comparing groups
Any lower than 30 required specialized handling with
“small sample statistics”
Versus critical-ratio approach
Can lead to only a fifty-fifty chance of getting
significant results
Less is More
Should be studying few IV’s and even fewer DV’s
Which DV’s are real and which are due to chance
As number of IV’s increase chances their redundancy in
regards to criterion relevance also increases
Reporting numerical results
What does r = .12345 really mean?
Serve as a distraction from meaningful leading digits
Simple Is Better
Reporting of Data and Representation
Do not usually make it possible for most of us or
consumers of products to actually see and understand
the distribution
Need for graphic representation
Computers and Statistical packages
Loss of contact with data
Idea that knowledge of statistics isn’t necessary to use
Compositing of Values
Beta weights vs. unit weights
Generate a higher
correlation than any other
weight.
CATCH!
Only guarantees to be better
than unit weights for the
sample on which they were
determined.
Very rare circumstances
when Beta is better
Unit weights are usually
more practical (+1 for positively
related predictors, -1 for
negatively related predictors,
and 0).
Work well outside of
multiple regression when we
have criterion data
Better on standardized scores
for our purposes than those
generated by program
The Fisherian Legacy
Based on principle that science proceeds only through
inductive inference, which is achieved by rejecting the null
hypothesis, usually at .05 level.
Misinterpretation of Yes/No decision feature
Research is frequently designed to produce decisions,
although things are not always so clearly decision oriented
Null Hypothesis – any statement about a state of affairs in a
population, usually the value of a parameter, frequently
zero. It is called a null hypothesis because the strategy is to
nullify it or because it means “nothing doing”.
The Dreaded .05 Level
Basis for decision – cut off level
Lead to possible data fudging to massively altering data
to dropping cases where there “must have been errors”
The Null Hypothesis Tests Us
Results do not tell us the truth of the null hypothesis,
must turn to Bayesian stats in which probability isn’t a
relative frequency but a degree of belief.
What is does tell us is the probability of the data given
the truth of the null
NOT THE SAME THING
p Value
P value does not tell us the probability that the null is
true, then it cannot tell us the probability that the
research is true.
Rejection of null gives us no basis for estimating the
probability that a replication of the research will again
result in rejecting the null.
True meaning of statistical significance
Effect is not nil, and nothing more
Temptation
Problems with NH
If the NH is almost always false, what’s the big deal
about rejecting it?
Also supported by Trafimow and Rice (2009).
If tests exceeded critical value, you could conclude that
null is false, but if you fell short of that value you
couldn’t conclude it was true.
Reality: Can’t conclude anything.
If null was false – had to be false to some degree
Power Analysis
Based on four parameters
Alpha significance criterion
Sample size
Population effect size
Power of the test
Made it possible to “prove” null hypotheses
By showing that it is of no more than negligible or
trivial size
Must consider the magnitude of effects
How To Use Statistics
Use of graphic and numerical analyses in ways in
which we can understand them.
Plan the research
Must have credible set of specifications or discover
research is not possible.
Use of effect size measures which include mean
differences, correlations, and squared correlation of all
kinds. All of which will lead you to a sample effect size
How To Use Statistics
After finding the sample effect size, attach a p value (or
better) a confidence interval.
Most important rule – judgment of the scientist
Take Home Message
A single piece of research doesn’t settle an issue once
and for all. Only a successful future replication in same
and different settings provides an approach to settling
the issue.
.05 should not be a cliff, but a reference point along
the possibility-probability continuum.
Things take time.
The Earth Is Round (p <.05)
By Jacob Cohen
Problems with Null
Hypothesis
Does not tell us what we want to know
Given this data, what is the probability that NH is true
Really says, “Given that NH is true, what is the
probability of these (or more extreme) data?”
The Permanent Illusion
Misapplication of deductive syllogistic reasoning
Invalid Bayesian interpretation
Level of significance at which the NH is rejected (.05)
is the probability that it is correct, or at least that it is
of low probability
Why P(D|Ho) ≠ P(Ho|D)
P(D|Ho) = when Ho is tested, finding the probability
that the data could have arisen if Ho were true
The real issue = P(Ho|D) the inverse probability
The probability that Ho is true given the data
Reason why we conduct statistical tests – to be able to
reject Ho because of its unlikelihood
Posterior Probability
Available only through Bayes’s theorem
Have to know the probability of the NH before the
experiment, the “prior” probability P(Ho)
Problem: We do not normally know this
Can be done through Bayesian Stat’s by posting prior
probability or distribution of probabilities.
Extremely unreliable
Use of different prior probabilities
G.K. Huysamen (2005).
Illusion of Attaining
Improbability
Also known as Bayesian Id’s Wishful Thinking Error
Extremely easy to make
Made by 68 out of 70 academic psychologists studied by Oakes (1986,
pp. 79-82).
Problem: Belief that after a successful rejection of Ho it is highly
probable that replications will also result in rejection of Ho.
Could not be farther from the truth
Just because Ho is rejected does not mean that the theory is
established.
Remember – Science experiment is not to make decisions but to
make adjustments to the degree of belief.
The Nil Hypothesis
The null in Ho is taken to mean nil or zero
Which is mistakenly thought as the effect size is 0 –
that the population mean difference, correlation, and
raters reliability is 0 (a Ho that can almost always be
rejected, even with a small sample)
Criticism – Where its use may be valid only for true
experiments involving randomization (controlled
clinical trials) or when any departure from pure chance
is meaningful (laboratory experiments or clairvoyance)
What To Do
Do not look for an alternative to NHST
Must understand and improve data before we can
generalize from our data
Report ES through confidence intervals
Improve our measurement by reducing the unreliable
and invalid parts of the variance in our measures.
Use of informed judgment when using theories
Discussion Questions
Why do you think many researchers still support
NHST as it stands?
Has psychology as a field become more focused on
getting significant results rather than completing the
proper process of an experiment? Do you think it is
more prominent in other fields?
How can we as psychologists eliminate confusion and
misuse of NHST?
References
Cafri, G., Van den Berg P., & Brannick, M.T. (2009). What have the difference scores
not been telling us? A critique of the use of self-ideal discrepancy in the assessment of
body image and evaluation of an alternative data-analytic framework. Assessment, 17(3),
361-376.
Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 49(12), 997-1003.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304-1312.
Edwards, J.R. (2001). Ten difference score myths. Organizational Research Methods, 4(3),
265-287.
Huysamen, G.K. (2005). Null hypothesis significance testing: ramifications, ruminations,
and recommendations. South African Journal of Psychology, 35(1), 1-20.
Trafimow, D. & Rice, S. (2009). A test of the null hypothesis significance testing
procedure correlation argument. The Journal of General Psychology, 136(3), 261-269.