PSY 525 - People Server at UNCW

Download Report

Transcript PSY 525 - People Server at UNCW

Complications in the field of psychology
• Constructs are not well defined nor are they
•
•
•
directly observable (intelligence)
Compare to the problem of measuring
something like brain size (assuming that it has
some relation to intelligence)
How would this be done?
Similar issues with every construct defined in the
DSM (note even the issue of lack of agreement
between different versions of the DSM) and
other diagnostic criteria such as the international
classification of diseases-10)
Why should we assess?
• What do we gain?
• What is the cost?
• How should it be done?
• What should be assessed?
– First assignment is a 2 page paper on the first
organizing question (think about this)
Your assessments count!
• See Exhibit 1-1 pp 2-3 Daniel Hoffman v. Board
•
•
•
•
•
•
•
of Education of city of NY
Reports are used by mental health workers,
administrators, courts, etc.
Words can be misleading (be clear)
IQs can change
Different tests may provide diff. IQs
Base decisions on multiple tests
Use appropriate tests
Review previous findings/testing
Four Pillars of Assessment
• 1. normed-referenced tests
• 2. interviews
• 3. observations
• 4. informal assessment procedures
Tests: a) same item content, b) same
administration procedure, and c) same
scoring criteria.
Steps in the assessment process
•
•
•
•
•
•
•
•
•
•
•
1. Review referral
2. Decide whether to accept it
3. Obtain relevant background info
4. Consider influence of relevant others
5. Observe client in multiple settings
6. Select/administer appropriate test battery
7. Interpret the assessment
8. Develop/select intervention strategies
9. Write report
10. Meet with examinee
11. Follow-up an re-evaluate (see Dawes)
Why is assessment so important?
• In November of 2000, we tried to elect a
•
•
•
•
president.
What happens when the margin of victory is
smaller than the margin of error in counting (the
latter being 1 in 7,000)?
Impossible to ever know who really won
Issue of what constitutes a “vote” (removal of
chad, depression of chad, intent to vote, etc.)
What margin of victory is a real victory?
(significant difference is determined by the
standard error of measure)
Clinical assessment/judgment
• Although the literature is replete with criticisms
•
•
•
of standardized assessments, how does the
more informal version (i.e., clinical judgment)
do?
How is your clinical judgment?
Do you expect that it will improve with training?
How would you stack up compared to someone
with no training who is just given instructions to
follow?
– See Dawes et al., 1989 (Mon.)
Assessment patterns – what is
assessed
• Most commonly used tests have varied
over time and setting
• Today, a wide variety of tests are
employed, representing many diverse
perspectives (from behavioral to
psychodynamic)
• Compare educational vs. inpatient vs.
counseling settings
Steps in the assessment process
•
•
•
•
•
•
•
•
•
•
•
1. Review referral
2. Decide whether to accept it
3. Obtain relevant background info
4. Consider influence of relevant others
5. Observe client in multiple settings
6. Select/administer appropriate test battery
7. Interpret the assessment
8. Develop/select intervention strategies
9. Write report
10. Meet with examinee
11. Follow-up an re-evaluate (see Dawes)
Why do we use assessment?
• To provide a functional analysis of the patient
•
•
(“what” they can and cannot do, with less
emphasis on “why”)
To direct treatment (one must know what is
wrong in order to select an intervention)
What interventions/therapies have you (or are
you) learning and what are the indicators for
implementing those interventions?
Psychometrics
• Teach individuals about 5 tests, or teach them
how to evaluate tests?
– That feed you, or teach you to fish thingy.
• Psychometrics allow for an understanding of
•
what makes a test effective, how to evaluate
them, how to create good ones, and how to
extend this process to other methods of
evaluation
This is where science (the process of doing
research) and clinical work overlap
Scaling
•
•
•
•
Categorical – named categories
Ordinal – order, but unequal intervals
Interval – equal intervals but no true 0 point
Ratio – a true 0 point
– Only scale that technically allows for the calculation of
a mean, SD, and most parametric statistics
• The type of scale will dictate the type of
•
statistics that can be used
E.g., a nominal scale should only use the mode
as a measure of central tendency
Tools of the trade
• You must be very familiar (for this class and
•
ultimately, the licensing exam) with the
following concepts – they will be reviewed in
greater detail in the readings
Measures of central tendency – mean, mode,
median
– How to calculate them, the strengths and weaknesses
of each, when to use them
• Measures of variability – SD & various ranges
– How to calculate them, the strengths and weaknesses
of each, when to use them
Tools – continued 1
• How do the measures of central tendency and
•
variability relate to one another? Are there some
that shouldn’t be used together?
Understanding the normal curve and probability
theory (see overhead of percentages)
– This information is necessary in order to interpret any
assessment results
• Why is variability important? (between, not
within, individuals)
Reliability
• Reliability – consistency between raters (see
•
•
Cohen’s Kappa), between parallel versions of
the same test, within the one test (split half,
Chronbach’s alpha), & from one
administration to another
How does this relate to standardization? How
does this relate to variability? How does this
relate to validity? How does this relate to the
accuracy of measurement?
Standard error of measure (SEM) = SD X
square root of (1 – the test’s reliability)
– Possible range of SEM is 0 to the test’s SD
– The smaller the SEM the better? Why?
Reliability and error
Rating errors
– Constant (leniency, severity, tendency to the mean),
halo effects, contrast (with previous subject or
oneself), proximity (an item’s location on the printed
page can result in ratings similar to nearby items),
most-recent-performance, and/or inadequate
information errors
• These can be minimized with more raters, exact
instructions, intense training, frequent
evaluation and recalibration
p. 2 – reliability and error
• Scale calibration
– More items are typically needed to achieve high
reliability, but there are exceptions
– Guttman approach involves ordering items in terms of
their level of difficulty (ascending)
• How does one determine level of difficulty?
– This approach assumes that once a specified number
of items are missed, the more difficult items to follow
will also be missed, therefore no need to administer
them
– Cost of this approach? Problems?
p. 3 – reliability and error
• Coefficient of determination or R-squared is
•
used when determining the amount of one
variable that can be accounted for by a second
variable (predictor)
Criterion contamination – when one knows
information that makes it impossible to do a fair
test of criterion validity (e.g., race of the skulls)
– Must conduct blind ratings
p. 4 – reliability and error
• Base rates represent an extremely
important source of information and are
often ignored (e.g., Rosenthal’s famous
study of students who claim to hear voices
and admit themselves to a psychiatric
hospital)
– Why is it easier to predict behaviors or
outcomes that occur at a base rate near
50%?
p. 5 – reliability and error
• All measurement represents an estimate of
•
whatever is being assessed. Therefore statistics
are needed to help make such estimates
(inferences)
Statistical power is crucial to decision making
– Alpha = Type I error or the probability of rejecting
the null when it is true
– Beta = Type II error or the probability of failing to
reject the null when it is in fact false.
• Parametric vs. nonparametric (few or no
distributional assumptions for the data)
Minimizing error
• Standardization – refers to the consistency in
applying methods
– Implications for testing
– Costs of violating standardization (weigh such a
decision very carefully, as there are major costs)
• Use of proper norms – when can the norm group
deviate from those to whom it is applied?
– What constitutes an effective norm group?
• Systematic and random error
– Difficulties in detection and correction
Validity
• Validity – is the test doing what you think
it’s doing?
– Face validity is important for lay people (for
them to believe the test is valid). Other
advantages/disadvantages of face validity?
– Content, construct, predictive, convergent,
discriminant.
– Internal/external validity (trade-off?)
Factor analysis: What is it?
Assignment: Using point form, briefly describe the key events
the television show “The Apprentice”
• What themes emerge? How many different themes? Are the
different themes related?
– Qualitative FA
• Factor analysis represents a method of organizing &
reducing data into latent (not directly assessed) constructs.
• This is a mathematical rather than a conceptual
(qualitative) organization of the data
• Can be exploratory (no a priori theory) or confirmatory
(compares data to a theory or previous data)
• FA in APA journals: 70s = 4%, 80s = 9%, 90s = 21%
Factor analysis: Why do it?
• Data reduction
– conserve df
– minimize problems of multicollinearity
– Models for computing composite scores & item parcels
• Scale construction & revision (improve psychometrics)
– empirical validation (or revision) of theoretical models
– arrangement of items (pos & neg loadings)
– relative importance of different items; how central each
is to the latent construct (how to do item selection?)
• Factor(s) must be replicable, generalizable & interpretable
• Mean comparisons are irrelevant if factor structures differ
e.g., typical study comparing males & females, patients to non-patients
Factor analysis: How to do it?
Start with multi-item (min 3/construct) on a ratio or interval scale
EFA (Exploratory Factor Analysis)
• Ns can range from 5 subjects per item minimum to the ideal 10:1 ratio,
though depends on loadings and communality (1-unqueness). Min. =
100
• EFA will determine the number of factors to extract
– Varies with the number of latent constructs and the number of
items (under vs. over extraction)
– Simulated sets of random data will still result in the emergence of
factor(s), so check scree plot of factors and their eigenvalues to find
the descending linear trend (see p. 291).
– Eigenvalue? The amount of the variance explain by each vector.
Standardize items to z-scores (M=1, Var=1), sum of the variances
= # items.
- Item loadings = the items correlation with the vector on which it
loads.
– When do factors dip below eigenvalues for a random data set with
same N? (p. 291)
How many latent factors should
emerge?
• Scree plots, 50% variance rule & your theory (do not
use the 1.0 eigenvalue default)
• Low item loadings (.35 or <) typically represent error
variance and will not replicate in an independent
sample, so avoid factors made up of such item
loadings
• As items are added, more factors emerge, but this is
not just an artifact of the number of items. It could
be that new factors are emerging…
• e.g., I often feel tired, I am rarely sad, I cry
often, I never smile, I rarely sleep, Often I am
not hungry
• 2 possible factors emerge assessing the latent
constructs of depression and timeframe
How to do it? Oblique vs. Orthogonal
rotations
• You must determine the relation between all of the
factors (assuming there is more than 1 factor)
– This should be based on a theoretical rationale
– Orthogonal (statistically independent/unrelated)
• Items with high loadings on F1 are near 0 loading on F2
= simple structure
•
•
– Oblique (stat. dependent); Fs allowed to intercorrelate
Orthogonal rotations – advantage is that it is more easily
interpretable, though it may not fit well with the data (if the
latent constructs are not independent)
Oblique rotations – advantage is that it can account for
more of the data (especially if the latent constructs are not
independent), though it is more difficult to interpret
How to interpret an EFA?
• Examine the number of factors and the
number of rotations needed for the factor
structure to converge
• Low loadings (< .35) are likely to be error
variance
• Factors with few items are likely to be
spurious factors
• Will the factor structure replicate in a second
independent sample?
– This is essential, especially when the initial EFA was truly
exploratory
– Emergent factors will capitalize on chance associations in the
data set (i.e., the same type I errors observed whenever
conducting numerous analyses)
• Now that you have a factor structure, what next?
What is CFA?
• CFA is a powerful statistical technique that
allows one to define a model and then
determine how well the data set matches the
predicted model (using chi-square and several
fit indices) – can test entire model
simultaneously
• The predicted model can be theoretically
derived or empirically derived (see EFA
findings), though if it is the later it MUST be on
a different sample to allow for cross-validation
• With large samples, randomly split the data
(using random ID selection) into to equal
sections
• Minimum N for CFA = 200 million. Findings are
more robust (stable) as N increases
Assessing the fit of the model in CFA
• Compare predicted model to observed data using the
chi-square statistic (the smaller the better = no sig
difference between observed and expected)
• Nested modeling – compare the fit of different models to
each other
Law of parsimony = all multifactor models must fit the data at a
level that is sig. better than a one factor model (calculate the
chi-square difference)
• Indices of fit also used to evaluate the fit of all models –
•
•
•
based on model chi-square, null model chi-square, and
df.
Comparative fit index (CFI) = 1-[(C2m-dfm)/(C2n-dfn)]
Bentler-Bonnet index (BBI) = (C2n - C2m)/C2n
Delta2 (small Ns), TLI (all require fit > .90), RMSR(0-.05).
Factor analysis and construct validity
• No longer acceptable to publish a scale without considering
•
•
•
its factorial structure
If your scale is supposed to assess one construct, then this
issue can be empirically evaluated (EFAs & CFAs).
Ultimately, one can only test the number of factors and how
they relate to one another, not the actual content of the
factors (inferred from item content)
If the theory for the construct and the FA do not
correspond, then there are two alternatives:
– 1) The underlying theoretical constructs may not be correctly
specified (your theory is wrong)
– 2) The theory may be adequate, but the scale used to assess it is
not
• Examples?
Organization of constructs
• Factor structures – factor analysis (FA)
and confirmatory factor analysis (CFA)
– Differences between these procedures
– Meaning of eigenvalues (extraction)
– Rotations (e.g., oblique vs. orthogonal)
• How are the constructs inter-related?
– Organizational and explanatory power
– Theoretical and (not vs) empirical decisions
What is the construct of
intelligence?
• How do you define it?
• How have others defined it?
– This definition will determine how the tests are
constructed, administered, and interpreted
• What is intelligence and do modern IQ tests
measure it? – write paper on this
– Various conflicting views on this (e.g., Gould suggests
that we can’t measure it and don’t with our current
tests whereas Boring suggests that it IS what
intelligence tests measure)
– See handout of definitions
PSY 525
Intellectual Assessment
A few tests that we will focus on are those that
are commonly used by psychologists, those
that are psychometrically sound, and those
that you need to know in order to do your job.
Assessing LD with the WAIS-III
• Individuals with LD in reading and math
•
•
•
generally exhibit IQ scores in the average range.
Index scores are, however, noteworthy:
VCI tend to be 7-13 points higher relative to
WMI scores (e.g., VCI is 15 points or greater
than the WMI for almost 42% of those with
reading disabilities).
POI is approx. 7 points higher than PSI scores
for all LD individuals (e.g., POI scores are at 15
points higher than PSI scores for almost 31% of
those with LDs).
Intelligence testing in problem
populations
• The Leiter was developed to evaluate
cognitive functioning (i.e., intelligence) in
individuals who are deaf-mute, nonverbal
persons.
• It can also be used with clients from other
cultures who do not (or minimally)
verbalize in English nor their native
language
Neuropsychological Evaluation
• Head trauma is primary cause of closed head
•
•
•
injuries in the population (adolescents and
adults)
Closed head injuries cause more widespread
injuries and usually result in a period of lost
consciousness.
Amnesia usually results (anterograde and
retrograde)
Duration of anterograde amnesia is the best
predictor of degree of injury and probability of
recovery