power point notes (for exam 1)

Download Report

Transcript power point notes (for exam 1)

PSY 525
Complications in the field of psychology




Constructs are not well defined nor are they
directly observable (intelligence)
Compare to the problem of measuring
something like brain size (assuming that it
has some relation to intelligence)
How would this be done?
Similar issues with every construct defined in
the DSM (note even the issue of lack of
agreement between different versions of the
DSM) and other diagnostic criteria such as
the international classification of diseases-10)
Why should we assess?




What do we gain?
What is the cost?
How should it be done?
What should be assessed?

First assignment is a 2 page paper on the
first organizing question (think about this)
The functional role of assessment for
patients



To provide a functional analysis of the patient
(“what” they can and cannot do, with less
emphasis on “why”)
To direct treatment (one must know what is
wrong in order to select an intervention)
What interventions/therapies have you (or
are you) learning and what are the indicators
for implementing those interventions?
Your assessments count!








See Exhibit 1-1 pp 2-3 Daniel Hoffman v.
Board of Education of city of NY
Reports are used by mental health workers,
administrators, courts, etc.
Words can be misleading (be clear)
IQs can change
Different tests may provide diff. IQs
Base decisions on multiple tests
Use appropriate tests
Review previous findings/testing
Types of Assessment (pp. 4-5)






Screening
Focused
Diagnostic
Counseling and Rehabilitation
Progress Evaluation
Problem-solving
Four Pillars of Assessment




1.
2.
3.
4.
normed-referenced tests*
interviews
observations
informal assessment procedures
*Tests: a) same item content, b) same
administration procedure, and c) same
scoring criteria (i.e., must be
standardized to be considered tests)
Steps in the assessment process











1. Review referral
2. Decide whether to accept it
3. Obtain relevant background info
4. Consider influence of relevant others
5. Observe client in multiple settings
6. Select/administer appropriate test battery
7. Interpret the assessment
8. Develop/select intervention strategies
9. Write report
10. Meet with examinee
11. Follow-up an re-evaluate (see Dawes)
Clinical assessment/judgment
Dawes et al., 1989




Although the literature is replete with
criticisms of standardized assessments, how
does the more informal version (i.e., clinical
judgment) do relative to actuarial models?
How is your clinical judgment?
Do you expect that it will improve with
training?
How would you stack up compared to
someone with no training who is just given
instructions to follow?

See Dawes et al., 1989
Assessment patterns – what is assessed
- Lubin et al, 1985




Most commonly used tests have varied
over time and setting
Today, a wide variety of tests are
employed, representing many diverse
perspectives (from behavioral to
psychodynamic)
Compare educational vs. inpatient vs.
counseling settings
Tests employed driven by referral
Psychometrics

Teach individuals about 5 tests, or teach
them how to evaluate tests?



That feed you, or teach you to fish thingy.
Psychometrics allow for an understanding of
what makes a test effective, how to evaluate
them, how to create good ones, and how to
extend this process to other methods of
evaluation
This is where science (the process of doing
research) and clinical work overlap
Scaling




Categorical – named categories
Ordinal – order, but unequal intervals
Interval – equal intervals but no true 0 point
Ratio – a true 0 point



Only scale that technically allows for the calculation of a
mean, SD, and most parametric statistics
The type of scale will dictate the type of statistics
that can be used
E.g., a nominal scale should only use the mode as a
measure of central tendency
Tools of the trade


You must be very familiar (for this class and
ultimately, the licensing exam) with the
following concepts – they will be reviewed in
greater detail in the readings
Measures of central tendency – mean, mode,
median


How to calculate them, the strengths and
weaknesses of each, when to use them
Measures of variability – SD & various ranges

How to calculate them, the strengths and
weaknesses of each, when to use them
Tools – continued 1


How do the measures of central tendency
and variability relate to one another? Are
there some that shouldn’t be used together?
Understanding the normal curve and
probability theory (see overhead of
percentages)


This information is necessary in order to interpret
any assessment results
Why is variability important? (between, not
within, individuals)
Reliability



Reliability – consistency between raters (see
Cohen’s Kappa), between parallel versions of
the same test, within the one test (split half,
Chronbach’s alpha), & from one
administration to another
How does this relate to standardization? How
does this relate to variability? How does this
relate to validity? How does this relate to the
accuracy of measurement?
Standard error of measure (SEM) = SD X
square root of (1 – the test’s reliability)


Possible range of SEM is 0 to the test’s SD
The smaller the SEM the better? Why?
Reliability and error
Rating errors


Constant (leniency, severity, tendency to the
mean), halo effects, contrast (with previous
subject or oneself), proximity (an item’s location
on the printed page can result in ratings similar to
nearby items), most-recent-performance, and/or
inadequate information errors
These can be minimized with more raters,
exact instructions, intense training, frequent
evaluation and recalibration
p. 2 – reliability and error

Scale calibration


More items are typically needed to achieve high
reliability, but there are exceptions
Guttman approach involves ordering items in
terms of their level of difficulty (ascending)



How does one determine level of difficulty?
This approach assumes that once a specified
number of items are missed, the more difficult
items to follow will also be missed, therefore no
need to administer them
Cost of this approach? Problems?
p. 3 – reliability and error


Coefficient of determination or R-squared is
used when determining the amount of one
variable that can be accounted for by a
second variable (predictor)
Criterion contamination – when one knows
information that makes it impossible to do a
fair test of criterion validity (e.g., race of the
skulls)

Must conduct blind ratings
p. 4 – reliability and error

Base rates represent an extremely
important source of information and are
often ignored (e.g., Rosenthal’s famous
study of students who claim to hear
voices and admit themselves to a
psychiatric hospital)

Why is it easier to predict behaviors or
outcomes that occur at a base rate near
50%?
p. 5 – reliability and error


All measurement represents an estimate of
whatever is being assessed. Therefore
statistics are needed to help make such
estimates (inferences)
Statistical power is crucial to decision making



Alpha = Type I error or the probability of rejecting
the null when it is true
Beta = Type II error or the probability of failing to
reject the null when it is in fact false.
Parametric vs. nonparametric (few or no
distributional assumptions for the data)
Minimizing error

Standardization – refers to the consistency in
applying methods



Use of proper norms – when can the norm group
deviate from those to whom it is applied?


Implications for testing
Costs of violating standardization (weigh such a
decision very carefully, as there are major costs)
What constitutes an effective norm group?
Systematic and random error

Difficulties in detection and correction
An applied look at the SEM





In November of 2000, we tried to elect a
president.
What happens when the margin of victory is
smaller than the margin of error (SEM) in
counting (the latter being approx 1 in 7,000)?
Impossible to ever know who really won
Issue of what constitutes a “vote” (removal of
chad, depression of chad, intent to vote, etc.)
What margin of victory is a real victory?
(significant difference is determined by the
standard error of measure)
Validity

Validity – is the test doing what you
think it’s doing?



Face validity is important for lay people
(for them to believe the test is valid).
Other advantages/disadvantages of face
validity?
Content, construct, predictive, convergent,
discriminant.
Internal/external validity (trade-off?)
Factor analysis: What is it? (validity?)
Assignment: Using point form, briefly describe the key events
the television show “The Apprentice”
 What themes emerge? How many different themes? Are the
different themes related?
 Qualitative FA
 Factor analysis represents a method of organizing &
reducing data into latent (not directly assessed) constructs.
 This is a mathematical rather than a conceptual
(qualitative) organization of the data
 Can be exploratory (no a priori theory) or confirmatory
(compares data to a theory or previous data)
 FA in APA journals: 70s = 4%, 80s = 9%, 90s = 21%
Factor analysis: Why do it?




Data reduction
 conserve df
 minimize problems of multicollinearity
 Models for computing composite scores & item parcels
Scale construction & revision (improve psychometrics)
 empirical validation (or revision) of theoretical models
 arrangement of items (pos & neg loadings)
 relative importance of different items; how central each
is to the latent construct (how to do item selection?)
Factor(s) must be replicable, generalizable & interpretable
Mean comparisons are irrelevant if factor structures differ
e.g., typical study comparing males & females, patients to non-patients
Factor analysis: How to do it?
Start with multi-item (min 3/construct) on a ratio or interval scale
EFA (Exploratory Factor Analysis)

Ns can range from 5 subjects per item minimum to the ideal 10:1 ratio,
though depends on loadings and communality (1-unqueness). Min. =
100

EFA will determine the number of factors to extract
 Varies with the number of latent constructs and the number of
items (under vs. over extraction)
 Simulated sets of random data will still result in the emergence of
factor(s), so check scree plot of factors and their eigenvalues to find
the descending linear trend (see p. 291).
 Eigenvalue? The amount of the variance explain by each vector.
Standardize items to z-scores (M=1, Var=1), sum of the variances
= # items.
- Item loadings = the items correlation with the vector on which it
loads.
 When do factors dip below eigenvalues for a random data set with
same N? (p. 291)
How many latent factors should emerge?



Scree plots, 50% variance rule & your theory (do not
use the 1.0 eigenvalue default)
Low item loadings (.35 or <) typically represent error
variance and will not replicate in an independent
sample, so avoid factors made up of such item
loadings
As items are added, more factors emerge, but this is
not just an artifact of the number of items. It could
be that new factors are emerging…
 e.g., I often feel tired, I am rarely sad, I cry
often, I never smile, I rarely sleep, Often I am
not hungry
 2 possible factors emerge assessing the latent
constructs of depression and timeframe
How to do it? Oblique vs. Orthogonal rotations

You must determine the relation between all of the
factors (assuming there is more than 1 factor)
 This should be based on a theoretical rationale
 Orthogonal (statistically independent/unrelated)

Oblique (stat. dependent); Fs allowed to intercorrelate
Orthogonal rotations – advantage is that it is more easily
interpretable, though it may not fit well with the data (if the
latent constructs are not independent)
Oblique rotations – advantage is that it can account for
more of the data (especially if the latent constructs are not
independent), though it is more difficult to interpret



Items with high loadings on F1 are near 0 loading on F2
= simple structure
How to interpret an EFA?




Examine the number of factors and the
number of rotations needed for the factor
structure to converge
Low loadings (< .35) are likely to be error
variance
Factors with few items are likely to be
spurious factors
Will the factor structure replicate in a second
independent sample?



This is essential, especially when the initial EFA was truly
exploratory
Emergent factors will capitalize on chance associations in the
data set (i.e., the same type I errors observed whenever
conducting numerous analyses)
Now that you have a factor structure, what next?
What is CFA?




CFA is a powerful statistical technique that
allows one to define a model and then
determine how well the data set matches the
predicted model (using chi-square and several
fit indices) – can test entire model
simultaneously
The predicted model can be theoretically
derived or empirically derived (see EFA
findings), though if it is the later it MUST be on
a different sample to allow for cross-validation
With large samples, randomly split the data
(using random ID selection) into to equal
sections
Minimum N for CFA = 200 million. Findings are
more robust (stable) as N increases
Assessing the fit of the model in CFA


Compare predicted model to observed data using the
chi-square statistic (the smaller the better = no sig
difference between observed and expected)
Nested modeling – compare the fit of different models to
each other
Law of parsimony = all multifactor models must fit the data at a
level that is sig. better than a one factor model (calculate the
chi-square difference)




Indices of fit also used to evaluate the fit of all models –
based on model chi-square, null model chi-square, and
df.
Comparative fit index (CFI) = 1-[(C2m-dfm)/(C2n-dfn)]
Bentler-Bonnet index (BBI) = (C2n - C2m)/C2n
Delta2 (small Ns), TLI (all require fit > .90), RMSR(0-.05).
Factor analysis and construct validity




No longer acceptable to publish a scale without considering
its factorial structure
If your scale is supposed to assess one construct, then this
issue can be empirically evaluated (EFAs & CFAs).
Ultimately, one can only test the number of factors and how
they relate to one another, not the actual content of the
factors (inferred from item content)
If the theory for the construct and the FA do not
correspond, then there are two alternatives:



1) The underlying theoretical constructs may not be correctly
specified (your theory is wrong)
2) The theory may be adequate, but the scale used to assess it is
not
Examples?
Organization of constructs

Factor structures – factor analysis (FA)
and confirmatory factor analysis (CFA)



Differences between these procedures
Meaning of eigenvalues (extraction)
Rotations (e.g., oblique vs. orthogonal)



How are the constructs inter-related?
Organizational and explanatory power
Theoretical and (not vs) empirical decisions
What is the construct of
intelligence?


How do you define it?
How have others defined it?


This definition will determine how the tests are
constructed, administered, and interpreted
What is intelligence and do modern IQ tests
measure it? – write paper on this


Various conflicting views on this (e.g., Gould
suggests that we can’t measure it and don’t with
our current tests whereas Boring suggests that it
IS what intelligence tests measure)
See handout of definitions
PSY 525
Intellectual Assessment
A few tests that we will focus on are those that
are commonly used by psychologists, those
that are psychometrically sound, and those
that you need to know in order to do your job.
Assessing LD with the WAIS-III/IV




Individuals with LD in reading and math generally
exhibit IQ scores in the average range.
Index scores are, however, noteworthy:
VCI tend to be 7-13 points higher relative to WMI
scores (e.g., VCI is 15 points or greater than the
WMI for almost 42% of those with reading
disabilities).
POI is approx. 7 points higher than PSI scores for all
LD individuals (e.g., POI scores are at 15 points
higher than PSI scores for almost 31% of those with
LDs).
Intelligence testing in problem
populations


The Leiter was developed to evaluate
cognitive functioning (i.e., intelligence)
in individuals who are deaf-mute,
nonverbal persons.
It can also be used with clients from
other cultures who do not (or
minimally) verbalize in English nor their
native language
Neuropsychological Evaluation




Head trauma is primary cause of closed head
injuries in the population (adolescents and
adults)
Closed head injuries cause more widespread
injuries and usually result in a period of lost
consciousness.
Amnesia usually results (anterograde and
retrograde)
Duration of anterograde amnesia is the best
predictor of degree of injury and probability
of recovery