Quality of a Measurement

Download Report

Transcript Quality of a Measurement

Assessment and Testing
Dr. Antoinette Lee
Department of Psychiatry
The University of Hong Kong
Antoinette M. Lee
Assessment and Testing
Spring 2005
Lecture Outline
• Overview of Assessment and
Testing
• Ethical Issues
• Basic Principles in Testing
• Basic Psychometrics
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
What Comes to Mind When
You Hear the Word
“Testing”?
• Exam?
• Endless Nights of Studying?
• Anxiety?
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
What Comes to Mind When You
Hear the Word “Testing”?
• What about the measurement of:
– Personality
– Psychological distress
– Stress
– Response to and adjustment to illness
– Lifestyle
– …………………………?
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
What is a Test?
• A test is a measuring device or technique used to
quantify behaviour (or characteristic)
• Helps in the knowledge and prediction of behaviour
(or characteristic)
• Behavior:
– Overt
• Tendency to engage in certain observable behaviour
– E.g. current alcohol consumption pattern
• How much a person has previously engaged in certain
observable behaviour
– E.g. coping strategies
– Covert
• Feelings, thoughts, attitudes
• One tool in one’s assessment effort
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
What is Assessment?
• Assessment is a broader concept
• Psychological testing is only one form of
assessment
• Assessing an individual helps give
meaning to the findings from the tests
conducted
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Purpose of Assessment and
Testing
“Never treat a disease without first being
sure of its species”
Gilibert, 1975
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
What is Assessment?
• The systematic process of gathering
information for the sake of
understanding the individual, deciding if
intervention is necessary, and designing
appropriate intervention if intervention
is warranted
• A continuous process
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Assessment in the Context of
Behavioral Health
Further / Continuous
Assessment
Referral
Assessment
Referrals
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Intervention
What is Assessment?
• In the context of behavioral health, assessment refers to an
information gathering endeavour aiming at:
 Identifying diseases, disorders or problems that warrant
attention
 Understanding the nature and severity of the problem
 Identifying possible causes of the problem
 Identifying consequences of the problem
 Understanding the context in which the problem is
situated
 Providing information that guides treatment selection and
planning
 Analyzing changes in behavior, emotions, or cognition as
therapy or intervention progresses
 Evaluating the usefulness of a program
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Assessment and Testing
• Assessment and testing as a process of
information gathering
– Targeted and focused
– Iterative
– Mutimodal
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Multimodal Nature of
Assessment
• Multiple modes or forms
 Physical measures
 Clinical Interview
 Observation and Behavioral Assessment
 Behavioral observation
 Clinical
 Naturalistic
 Self-monitoring
 Self-report measures
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Multidimensional Nature of
Assessment
• Multiple Aspects & Levels
– Biopsychosocial Approach
• Individual
• Family
• Sociocultural Context
• Multiple settings
• Multiple informants
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Concerns of Testing
• Because testing involves humans and
often animals, many ethical issues must
be considered before performing tests
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Ethical Issues
• The American Psychological Association
(APA) emphasizes the significance of
ethical principles for psychologists to
follow during interactions with people
• The APA website provides literature on
such codes of conduct and ethical
principles for reference
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Five General Ethical Principles
• The purpose of these five principles:
“…guide and inspire psychologists
toward the very highest ethical ideals of
the profession.”
~APA Ethics Code 2002
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Five General Ethical Principles
1)
2)
3)
4)
5)
Beneficence and Nonmaleficence
Fidelity and Responsibility
Integrity
Justice
Respect for People’s Rights and Dignity
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Five General Ethical Principles
1)
Beneficence and Nonmaleficence
– strive to protect the welfare and rights of the
test-takers
– E.g. Make it a priority to minimize the harm
(anxiety, stress, etc.) to the test-taker
2)
Fidelity and Responsibility
– ensure that professional standards are upheld
during the interaction with the patient
– Primary responsibility is to the test-taker
– E.g. Psychologists should not perform their work
for personal advantages
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Five General Ethical Principles
3) Integrity
– honesty with the test-taker
– Issue of deception esp. in research
4) Justice
– fairness to the assessment of test-taker
– E.g. items should not be culturally biased;
biases from a psychologist may alter the
assessment of a subject
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Five General Ethical Principles
5) Respect for People’s Rights and Dignity
– Crucial to protect the rights and confidentiality of
others
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Other Issues Pertinent to
Assessment
• Professional competence
– In test development
– Administration of test and interpretation of test results
• Copyright issues
• Labelling and the issue of stigma
• Need to address individual’s motive underlying the
request to be tested
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Risk to Participants
• Different types of risks:
– Physical injury – testing of a new drug to battle
effects of cancer
– Social injury – embarrassment from questions
about sexuality
– Mental or emotional stress – Strange Situation:
children being away from parents to test
attachment
• Minimize possibility of risk and back-up plans
to deal with problems if they do arise
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Informed Consent
• In all circumstances, informed consent
must be obtained from the participants
before assessment
• Verbal or written consents
• The consent acts as a contract, stating
the willingness of a participant to join
the study
Assessment and Testing
Antoinette M. Lee
Master of Behavioral Health
Informed Consent
• The following information should be included in
the informed consent:
–
–
–
–
–
–
–
Purpose of assessment and procedures
Voluntary participation
Possible risks involved
Foreseeable benefits
Confidentiality
Anonymity (esp. for research)
Contact information for enquiry
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Informed Consent
• Requirements of participants to give
informed consent:
– Capable to give consent: those not able may
include mentally delayed individuals or
children
– Full knowledge of nature of assessment:
benefits, risks, confidentiality, incentives such
as money, etc.
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Informed Consent
• Requirements continued:
– Voluntary Participation: willing to join the
study without any outside influences, and
right to withdraw
• Issue of unequal power
– Informed: had the chance to ask questions
about the study
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Confidentiality
• The release of sensitive information without
the acknowledgement from the participants
results in a serious offense of ethics
• Consent must be obtained from participants
after they have been explained clearly how
their personal information may be disclosed
• Conditions under which confidentiality may
be breached
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Debriefing
• Debriefing should be done after
participant finishes with the assessment
procedures
• Debriefing should be included as an
integral part of the assessment process
• Explanation of what has been done
• Interpretation of results
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Basic Principles in Assessment
and Testing
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Basic Principles in Testing &
Assessment
• Reliability
• Validity
• Sensitivity & Specificity
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Quality of a Measurement
Reliability
&
Validity
How is one different from another?
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Reliability and Validity
• Reliability: the extent to which a test
measures what it was designed to measure
consistently
– Accuracy, dependability, consistency, repeatability,
stability of the device
• Validity: The extent to which a tests
measures what it intends to measure
• Reliability is a prerequisite but not a
guarantee of validity
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Validity & Reliability
• Imagine playing a game of darts.
– Hitting the center red spot.
~ Validity
– Hitting a spot repeatedly.
~ Reliability
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Reliability
Hitting the center spot of the dartboard repeatedly…
• Reliability refers to the ability to measure a
characteristic in a systematic and repeatable
way.
• Degree to which test results can be replicated
(Does the test yield similar results consistently?)
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Classic Test Score Theory
• Every person possesses a “true” score, which can be found
when the person takes the test an infinite number of times
• The observed score on a test reflects the person’s “true”
score plus some error of measurement
• Error:
– The degree to which observed scores reflect things that
have nothing to do with true scores
– includes both systematic error and random error;
• Error in reliability theory refers to random error
OBSERVED = TRUE + ERROR
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Distribution of observed scores from repeated testing
of the same person
Mean of the observations is an estimate of the true score
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
What Does the Dispersion
Around the Mean Tell Us?
Standard deviation of error = standard error of measurement: on
average, how much an observed score deviates from the true score
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Domain Sampling Model
• Problem associated with using a limited
number of items to represent a larger
construct
– E.g. stress response
• Error introduced by using scores from a
sample of items to represent the whole
domain and to estimate the true score
– Large number of items selected  more
reliable
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Classical Test Score Theory
• Reliability is defined as the ratio of the true
score variance to the observed score
variance
r = σ2 true
σ2observed
 Percentage of observed variation (variation
in observed scores) attributable to variation
in the true score
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Domain Sampling Model
• However, we do not know what the true score is. It
has to be estimated from observed scores on tests
that are made up of sample of items from the domain
• There are sampling errors involved in the process of
sampling items from the domain to make up a test,
hence the estimate may not be accurate.
• Theoretically, if many tests are constructed by
sampling from the domain, we should get a normal
distribution of unbiased estimates of the true score
• We can then find the correlation between each of
these tests and each of the other tests, and take an
average of the correlations  unbiased estimate of
reliability
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Classical Test Score Theory
• Since it is impossible to find a person’s
“true” score, and we know that the
observed variance equals the true
variance plus the error variance,
s2observed = s2true + s2error
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Classical Test Score Theory
•
Combining the two equations yields:
s2 true
r = –––––––––––––
s2true + s2 error
s2 true
s2 error
1 – r = 1 – ––––––––––––– = –––––––––––––
s2true + s2 error
s2true + s2 error
s2 error
= –––––––––––
s2observed
 Percentage of observed variation attributable to
random error
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Classical Test Score Theory
• If r = 0.7 and 1 – r = 0.3:
70% of variation of scores among testtakers attributable to real differences
among these people, and 30% attributable
to random error (chance factors)
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Classical Test Score Theory
• Reliability coefficients can range from 0 to 1:
– Reliability coefficient of 1 indicates
complete reliability of a test.
– Reliability coefficient of 0 indicates
complete unreliability of a test.
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Test A
Test B
Error
Variation
10%
Error
Variation
35%
True Score
Variation
65%
True Score Variation
90%
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Ways of Assessing Reliability:
1) Test-Retest Reliability
2) Parallel-Forms Reliability
3) Internal Consistency (measured by split-half
method, Kuder-Richardson method, and
coefficient alpha)
4) Inter-rater Reliability
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Test-Retest Reliability
• Measure of temporal stability
• Estimated by test-retest coefficient
– Correlation of scores taken by a group of individuals at two
different administrations
• Same test administered on two occasions that are
separated by a specified time period
• Determine errors of measurement due to personal or
environmental conditions during the administration of
the test
• Coefficient affected by the length between two
administration of the test
• Only used when theory assumes that the
characteristic assessed is stable over time
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Parallel-Forms Reliability
• If interval between the two test administrations is
short, test takers may recall the questions and
responses of the first test
• In assessing parallel-form reliability, test items from
the second test are switched to similar items,
creating a parallel form of the original test
• Parallel-Forms Coefficient evaluates the correlation
between these two test scores
• Determine errors of measurement due to different
test items
• Difficult to do in reality
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Internal Consistency
• Degree of interrelatedness of the test
items or extent to which the items
measure the same thing
• “Are the different test items measuring
the same thing?”
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Internal Consistency Coefficients:
Split-Half Reliability
• Test is split into two halves
– Random split, odd-even, first and
second half
• Scores on one half correlated with scores
on the other half
• Adjustment needed as reliability is
underestimated due to shortened test
length
– Spearman-Brown formula
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Internal Consistency Coefficients:
Coefficient Alpha
• AKA Cronbach’s Coefficient
• Defined as:
α = (N) (1-ΣS2i)
(N-1)(S2t)
where N= number of items
S2i = variance of scores on item i
S2t= variance of total test scores
• 0< α <1
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Internal Consistency Coefficients:
Coefficient Alpha
• Coefficient alpha scores of 0.85 and
above are satisfactory for reliability
• Useful for scoring test items that are not
dichotomous (e.g. Strongly Agree-AgreeNeutral-Disagree-Strongly Disagree)
• For dichotomous scores, use KuderRichardson 20 (KR20)
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Inter-Rater Reliability
• Correlation of scores judged by different
raters
• Score of 0.85 and above are considered
satisfactory
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Validity
Hitting the center spot…
• Validity refers to the ability of a test to measure
the particular attribute of interest, such as
depression or sensation-seeking, that it was
intended to measure.
(Does the test measure what it is supposed
to measure?) Assessment and Testing
Antoinette M. Lee
Master of Behavioral Health
Validity
“A test is valid to the extent that
inferences made from it are
appropriate, meaningful, and useful”
~ Standards for Educational and
Psychological Testing (AERA, APA, & NCME,
1985, 1999)
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Validity
Types of validity:
1) Face validity
2) Content validity
3) Criterion-related validity
4) Construct validity (includes convergent
and discriminant validity, factorial
validity)
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Face Validity
• Validity of a test according to face value
• i.e, Does the test seem to measure what it is
supposed to be measuring?
• Simplest and crudest form of validity
• However, it is not considered an official or
true form of validity
• Some tests “look” like they are measuring a
construct but is actually not doing a good job
– E.g. many tests in popular magazines
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Content Validity
• The extent to which the test adequately
samples the domain of interest
• Obtained through careful examination of the
whole domain of interest
• Validity judged by experts in the field
• E.g. does the self-report measure include all
the important symptoms or manifestations
of depression?
• What about the SRRS?
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Content Validity
– Construct underrepresentation: inadequate
representation of a certain concept to be
tested
• E.g. Test on verbal abilities only include a
vocabulary section, but nothing on the testing of
conversation
– Construct-irrelevant variance: factors
irrelevant to the testing of the concept affect
the performance of the test
• E.g. Test performance is affected by motivation
of the test taker
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Criterion-Related Validity
• The test is the measure of a certain
construct while the criterion is the
standard to which the test is compared
• Criterion-related validity is the extent to
which a test demonstrates the correlation
between the measured construct and the
criterion
• Validity coefficient
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Criterion-Related Validity
• Two types of criterion-related validity:
– Concurrent validity: the degree to which the test
results correlates with the criterion (collected at the
same time)
• E.g. whether the Beck Depression Inventory can
distinguish between depressed patients and normals;
whether depressed patients score higher on a test of
distorted cognition then normals
– Predictive validity: the degree to which the test can
predict the criterion (collected at a later time)
• E.g. scores on distorted cognition related to development
of depressive episodes in the future
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Construct Validity
• The extent to which the construct (or characteristic) of
interest is measured by the test or measure
• A complex and lengthy process of evidence collection;
involves the collection of evidence from multiple
sources
• Construct validity is determined by evaluating the
extent to which the scores on the measure relate to
scores from other measures in an expected way
• Consider a self-report measure of anxiety-proneness:
– How can we demonstrate its construct validity?
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Convergent & Discriminant
Validity
• Convergent Validity: High correlation between
scores on the test and scores from other methods or
tests measuring the same construct or related
constructs
• Discriminant Validity: Low correlation between
scores on the test and scores on other tests
measuring a different construct
– Unrelated construct
– Similar but distinct construct
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Convergent Validity
Convergent validity can be obtained in two ways:
1. High correlation with other methods or test
measuring the same construct
•
Whether the different measures of the same
construct converge
2. Demonstration of specific relationships with
other constructs that we expect the test
scores should have relationship with
•
Concept-mapping
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Concept Map for Depression
Antoinette M. Lee
Assessment and Testing
Spring 2005
Convergent Validity
• A test measuring depression is compared to other
established tests measuring depression (e.g Beck
Depression Inventory) or clinician rating
– A high correlation (> 0.5) between the two tests
indicates convergent validity.
• What about our test on anxiety-proneness?
– Clinician rating
– Behavioral observation
– Change in scores after treatment
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Convergent Validity
• Finding convergent evidence for the measurement of the
construct “health”
• Construct validation versus criterion validation
• Convergent evidence:
– Relationship between test scores and subjects’ self-rated
health status
– Relationship between test scores and symptoms and chronic
medical conditions
– Relationship between test scores and clinical ratings
– Relationship with age
– Relationship with SES
– Relationship with frequency of doctor visit
– Different test scores between physically ill and non-physically
ill individuals
– Relationship with physiological measures reflecting disease
states (e.g. lung function)
Assessment and Testing
–
Pre-post
treatment
changes
in test
scores
Antoinette M. Lee
Master of Behavioral
Health
Discriminant Validity
• In contrast, a test measuring depression is
compared to other tests measuring unrelated
constructs (e.g intelligence as measured by the
Stanford-Binet Intelligence Scale) or similar but
different construct (Can you think of an example
of this?)
• A low correlation between the two tests indicates
discriminant validity and provides evidence of the
unique distinctiveness of our test.
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Factorial Validity
• Another form of construct validity; also a
form of content validity
• Use of factor analysis
• Whether the items of the test can be
represented by factors (or dimensions) that
make theoretical sense
• How can factorial validity of a measure
assessing depression be demonstrated?
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Construct Validity
• Construct validation is a cumulative process
• A process of evidence gathering
• Not restricted to convergent and discriminant
validity
• More broadly, construct validity is related to
theories of the particular construct
• Back to our example on measuring anxiety
proneness, construct validation involves an
examination of its relationship with other
constructs that are theoretically relevant
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Exercise
•
How would you go about validating the
following measures?
1. A measure of depression for Chinese
2. A measure of perceived barriers to behavior
change
3. A measure of insomnia
4. A measure of androgen deficiency in the
aging male (ADAM)?
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Multitrait-Multimethod Approach
• Developed by Campbell & Fiske (1959)
• Demonstrates convergent and discriminant validity of a
set of measures
• Information about reliability, convergent and
discriminant validity
• Allows for comparison of the assessment of a number
of traits (constructs) using different methods
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Multitrait-Multimethod Approach
• For example, a sample of subjects is assessed
for depression, anxiety, and stress using
methods of checklists, observation, and
interviews
• Correlations for each of the traits using each
of the methods are found and is placed in a
matrix such that correlations of traits using
one method can be compared to the
correlations of traits used by another method
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Multitrait-Multimethod Approach
• Correlations between the same traits using
different methods should be high to prove
convergent validity
• Correlations between different traits using the
same method should be low to demonstrate
discriminant validity
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Some Important Properties of
Diagnostic Tests
Sensitivity, Specificity, &
Predictive Value
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Results of Tests
Illness
Present
Positive Test
Result
Negative Test
Result
True Positive
(TP)
False Negative
(FN)
(Type I Error)
Illness
Absent
False Positive
(FP)
(Type
II Error)
Assessment and Testing
Antoinette M. Lee
Master of Behavioral Health
True Negative
(TN)
Sensitivity & Specificity
• Sensitivity: Proportion of patients who
are correctly identified as having a
disease by the test
• Specificity: Proportion of patients who
are correctly identified as not having a
disease by the test
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Sensitivity
• Formula for Sensitivity:
True Positive
True Positive +False Negative
~ Proportion of patients who are sick
and are correctly identified
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Specificity
• Formula for Specificity:
True Negative
True Negative +False Positive
~ Proportion of patients who are not
sick and correctly identified as not
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Predictive Values
• Proportion of the times a test measures
what is true of the patient, whether the
patient really have the illness or not
• Two types of Predictive Values:
Positive Predictive Value
Negative Predictive Value
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Positive Predictive Value
• Predictive value of positive (abnormal)
tests result
• Proportion of the times a positive test is
actually the true outcome (i.e. person
has illness)
• Formula:
True Positive
True Positive +False Positive
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Negative Predictive Value
• Predictive value of negative test result
• Proportion of the times a negative test
is actually the true outcome (i.e. person
does not have illness)
• Formula:
True Negative
True Negative +False Negative
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Summary
• Sensitivity (true + rate)
= TP/(TP + FN) x 100%
• Specificity (true - rate)
= TN/(TN + FP) x 100%
• Predictive value (+)
= TP/(TP + FP) x 100%
• Predictive value (-)
= TN/(TN + FN) x 100%
• Efficiency (% of true results, whether+ or -) = (TP +TN)/grand
total x 100%
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Example: A Test of Suicide Risk
Test result
Attempted
suicide
Did not
attempt
suicide
Test result
Attempted
suicide
Did not
attempt
suicide
Positive
TP = 26
FP = 25
Positive
TP = 18
FP = 10
Negative
FN = 4
TN = 45
Negative
FN = 12
TN = 60
Sensitivity = (26/26+4) x100% = 87%
Specificity = (45/45+25) x100% = 64% (too many FP)
Predictive value (+) = (26/26+25)x100% = 51%
Predictive value (-) = (45/45+4) x100% = 92%
Efficiency = (26+45)/100x100% = 71%
Antoinette M. Lee
Sensitivity = (18/18+12) x 100% = 60% (too many FN)
Specificity = (60/60+10) x 100% = 86%
Predictive value (+) = (18/18+10) x 100% = 64%
Predictive value (-) = (60/60+12)x100% = 83%
Efficiency = (18+60)/100x100% = 68%
Assessment and Testing
Master of Behavioral Health
Dilemma: Sensitivity &
Specificity
• At times, a test should have higher
preference for sensitivity rather than
specificity
• E.g. In the detection of suicide risk, a
high sensitivity is required of a test such
that no one will be in harm
• Therefore, a test for suicidal tendencies
must have high positive predictive value
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Dilemma: Sensitivity &
Specificity
• Other times, a test should emphasize
specificity than sensitivity
• E.g. A patient’s blood test comes back positive
for lupus; the pt undergoes different
treatments only to discover later there was an
error in the diagnosis: he didn’t have lupus
• In this case, a test should have a high negative
predictive value so that the pt did not have to
endure unnecessary treatments and emotional
distress
Assessment and Testing
Antoinette M. Lee
Master of Behavioral Health
• Tests of psychopathology
– Implication of false positive
• Labelling
• Stigma
• Self-fulfilling prophecy
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Dilemma: Sensitivity &
Specificity
• It is very important to consider the variable
under scrutiny and decide whether sensitivity
or specificity is more important
• Deciding on an appropriate level (and
tradeoff) of sensitivity and specificity:
– A matter of psychometic, clinical, moral, ethical,
social, and legal considerations
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Assuming that we have a reliable and
valid measure
and
we have collected test responses from a
sample of individuals, how do we proceed
with making sense of the test results?
Antoinette M. Lee
Assessment and Testing
Spring 2005
Basic Psychometrics
• Measures of central tendency and variability
• Testing of difference between (among)
groups
• Measures of association (relationship among
variables)
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Measures of Central Tendency
and Variability
• Central Tendency: the score in which
data tends to concentrate around
• Variability: measures the spread of the
data
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Central Tendency
• Three measures which define
central tendency:
1)Mean
2)Median
3)Mode
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Mean
• “the average”
• Used most often for measure of
central tendency
• Mean is equal to the sum of the
scores divided by the number of
scores which contributed to that sum
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Mean
Mean= Σ xi
N
where Σ xi= sum of individual scores xi
and N= total number of individual
scores
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Median
• The middle point of the distribution of
scores
• When there is an even number of
scores, the median is found by taking
the mean of the two middle scores when
the set is listed in a descending order
• When there is an odd number of scores,
the score that divides the top and the
bottom half equally is the median
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Mode
• The score that appears with the most
frequency, i.e, the score obtained by the
greatest number of people
• The mode is the highest point in a
frequency distribution
• If there are two modes found in a set of
scores, then the data is bimodal
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Mean, Median, Mode
• For this set of numbers:
2, 3, 3, 5, 5, 6, 6, 6, 8, 9
1) Mean: Σ xi/N
2+3+3+5+5+6+6+6+8+9 = 5.3
10
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Mean, Median, Mode
2) Median: Since there are 10 numbers, we
take the average of the middle two
numbers (the 5th and 6th number)
(5+6) = 5.5
2
3) Mode: The number that appears with
the highest frequency is 6
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Mean, Median, Mode
• In this example, the mean, median, and
mode are 5.3, 5.5, and 6 respectively.
• The measures of central tendency is
consistent in this example.
• However, when there are outliers
(extreme values) in a set of data, the
mean may shift from the median and the
mode
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Mean, Median, Mode
• For this set of numbers,
2, 3, 3, 5, 5, 6, 6, 6, 8, 100
we have an outlier of 100.
– Median: 5.5
– Mode: 6
– Mean:(2+3+3+5+5+6+6+6+8+100)/ 10
= 14.4
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Mean, Median, Mode
• Because the mean can be heavily
influenced by outliers:
– Consider the median as the measure
of central tendency when the
distribution of data is skewed
– It is also important to note the
pattern of distribution in the data
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Variability
• The extent of spread with the data set
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Variability
• Two sets of data may have the same
mean, but different variability:
μ = Mean, σ =
Standard Deviation
Assessment and Testing
Antoinette M. Lee
Master of Behavioral Health
Standard Deviation
• Measure of variability
• Displays the degree of dispersion in the
data
• Variance is defined as
s² = Σ(X- M) ²
(N-1)
Where X=individual scores, M= mean of
scores, N= totalAssessment
number
and Testing of scores
Antoinette M. Lee
Master of Behavioral Health
Standard Deviation
• Standard deviation is the square root of
variance
• A more commonly used measure of
variability
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Descriptive Statistics
• Analyze  Descriptive Statistics  Descriptives 
drag the variable(s) (e.g. age) to the right, then click
“OK”.
Des cri p t i v e St a ti s t i cs
N
Age
Valid N (listwise)
Antoinette M. Lee
365
365
Minimum
18
Maximum
75
Assessment and Testing
Master of Behavioral Health
Mean
44.65
Std. Deviation
14.463
Normal Distribution
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Normal Distribution
• Mathematically defined as:
– Symmetrical
– Bell shaped
– Mean (μ) represents the highest point of the
curve
• This ideal shape can be achieved when sample
size increases
• Many phenomena seen as normal distribution
e.g. height, birth weight, brain weight
• Sampling distribution of the mean
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Hypothesis Testing
• Population
– To totality of all cases
– Population values: parameter, e.g. μ, σ
• Sample
– Cases actually included in a particular study
– Random sample: sample in which the characteristics and
relationships of interest are independent of the probabilities
of being included in the sample
– Sample values: statistic, e.g.
, S2
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Hypothesis-Testing
• Null hypothesis
– The assumption that there is no significant difference
between two (or more) random samples of a
population or no association between 2 variables
– When the null hypothesis is rejected, observed
differences between groups are deemed to be
improbable by chance alone.
• P value
– Probability of obtaining a result by chance alone.
– The lower the value, the more stringent is the
requirement
Assessment
and Testing
–
Probability
of
Type
I
error
Antoinette M. Lee
Master of Behavioral Health
Type I and Type II Error
• In general, Type I and Type II Error refers to the
following:
– Type I Error: rejecting the null hypothesis when the null
hypothesis is true (e.g. false claim of a true difference when
the observed difference is due only to chance)
– Type II Error: retaining the null hypothesis when the null
hypothesis is not true
• Power: the probability of rejecting the null hypothesis
when, in real world, it should have been rejected; the
probability of identifying a true difference
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Frequency count
• Analyze  Descriptive Statistics  Frequencies 
drag the variable (e.g. group) to the right, then click
“OK”.St a t i st i c s
Group
N
Valid
Missing
367
0
Gro u p
Valid
Intervention
Control
Total
Antoinette M. Lee
Frequency
Percent
180
49.0
187
51.0
367
100.0
Assessment and Testing
Master of Behavioral Health
Valid Percent
49.0
51.0
100.0
Cumulative
Percent
49.0
100.0
Independent-samples t-test
• To test if two independent groups are different in one
or more continuous variables.
•
Analyze  Compare means  Independent-samples t-test  choose
a grouping variable (e.g. group), then “define groups” (e.g., Group 1 =
1, Group 2 = 2)  choose test variable(s) (e.g., age, perceived
improvement, compliance)
In d e p e n d e n t Sam p l es T es t
Levene's Test for Equality
of Variances
F
Age
Perceived improvement
Compliance
Equal variances assumed
Equal variances not assumed
Equal variances assumed
Equal variances not assumed
Equal variances assumed
Equal variances not assumed
If < .05, variances of 2 groups
not equal  statistical
adjustment M.
(look
at the second
Antoinette
Lee
row of statistics for t-test)
.091
Sig.
.763
4.279
.039
1.150
.284
t value
t
1.981
1.981
1.357
1.360
-.280
-.280
Degree of freedom
t-test for Equality of Means
df
363
362.738
342
340.814
339
336.858
Sig. (2-tailed)
.048
.048
.176
.175
.780
.780
Assessment and Testing
Master of Behavioral Health
Mean Difference
3.53
3.53
.41
.41
-.12
-.12
p-value, group difference is significant if < .05
Std. Error
Difference
1.784
1.783
.305
.305
.438
.438
95% Confidence Interval
of the Difference
Lower
Upper
.025
7.041
.026
7.040
-.186
1.015
-.185
1.013
-.984
.739
-.985
.740
Sign indicates
direction of difference
One-way ANOVA
• To test if more than two independent groups are
different in one or more continuous variables (Note that
independent-samples t-test is a special case of one-way ANOVA which tests only
2 groups).
•
Analyze  Compare means  One-way ANOVA  choose “factor”
(e.g. education), then choose “dependent list” (e.g., Compliance) 
click “Post hoc” and check “Tukey”,
then click “continue”  click “OK”
ANOVA
Compliance
Between Groups
Within Groups
Total
Sum of Squares
162.078
5260.691
5422.769
df
5
328
333
Mean Square
32.416
16.039
df = number of groups – 1
(i.e. there’re 6 education
Assessment and Testing
groups)
Antoinette M. Lee
Master of Behavioral Health
F
2.021
Sig.
.075
If p < .05, significant
between-groups difference
If there’s between-groups difference, look at post hoc results to see which groups are different
M u l t i p l e Co m p a ri so n s
Dependent Variable: Compliance
Tukey HSD
(I) Education
no formal education
Primary school
F.1 - F.3
F.4 - F.5
F.6 - F.7
Tertiary
Antoinette M. Lee
(J) Education
Primary school
F.1 - F.3
F.4 - F.5
F.6 - F.7
Tertiary
no formal education
F.1 - F.3
F.4 - F.5
F.6 - F.7
Tertiary
no formal education
Primary school
F.4 - F.5
F.6 - F.7
Tertiary
no formal education
Primary school
F.1 - F.3
F.6 - F.7
Tertiary
no formal education
Primary school
F.1 - F.3
F.4 - F.5
Tertiary
no formal education
Primary school
F.1 - F.3
F.4 - F.5
F.6 - F.7
Mean Difference
(I-J)
Std. Error
Sig.
1.40
1.088
.792
-.87
1.111
.970
-.45
.979
.997
-1.14
1.123
.912
-.28
.984
1.000
-1.40
1.088
.792
-2.27
.903
.123
-1.85
.733
.119
-2.54
.917
.064
-1.68
.740
.209
.87
1.111
.970
2.27
.903
.123
.42
.768
.994
-.27
.944
1.000
.59
.774
.974
.45
.979
.997
1.85
.733
.119
-.42
.768
.994
-.69
.784
.951
.17
.567
1.000
1.14
1.123
.912
2.54
.917
.064
.27
.944
1.000
.69
.784
.951
.86
.790
.884
.28
.984
1.000
1.68
.740
.209
-.59 and Testing
.774
.974
Assessment
-.17
.567
1.000
Master of-.86
Behavioral
Health .884
.790
95% Confidence Interval
Lower Bound
Upper Bound
-1.72
4.52
-4.06
2.32
-3.26
2.35
-4.36
2.08
-3.10
2.54
-4.52
1.72
-4.86
.32
-3.96
.25
-5.17
.08
-3.80
.44
-2.32
4.06
-.32
4.86
-1.78
2.62
-2.98
2.43
-1.63
2.81
-2.35
3.26
-.25
3.96
-2.62
1.78
-2.93
1.56
-1.45
1.80
-2.08
4.36
-.08
5.17
-2.43
2.98
-1.56
2.93
-1.40
3.13
-2.54
3.10
-.44
3.80
-2.81
1.63
-1.80
1.45
-3.13
1.40
Chi-square test
• To test if two groups are different in distribution of
one or more categorical variables
•
Analyze  Descriptive Statistics  Crosstabs  put a grouping
variable (e.g. group) under “Row(s)”  choose categorical variable(s)
(e.g., sex) under “Column(s)”  click “Statistics” and check “chi-square”
 click “OK”
C ro ss t ab
Count
Sex
male
Group
Intervention
Control
Total
51
60
111
female
129
127
256
Total
180
187
367
2
2-tailed p-value
C h i -Sq u a re T e st s
Value
.612 b
.447
.613
Antoinette M. Lee
df
Asymp. Sig.
(2-sided)
.434
.504
.434
Exact Sig.
(2-sided)
Pearson Chi-Square
1
Continuity Correctiona
1
Likelihood Ratio
1
Fisher's Exact T est
.495
Linear-by-Linear Association
.610
1
.435
Assessment and Testing
N of Valid Cases
367
Master
Health
a. Computed
onlyof
forBehavioral
a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 54.44.
Exact Sig.
(1-sided)
.252
Paired-samples t-test
• To test if the same sample is different on a measure
at 2 timepoints
•
Analyze  Compare means  paired-samples t-test  drag one or
more pairs of variables (e.g. Pre GHQ, Follow-up GHQ) to “paired
variable(s)”  click “OK”
Pair ed Samp les T est
Paired Differences
Pair 1
Pre GHQ total - FU GHQ total
Mean
1.79
Std. Deviation Std. Error Mean
2.993
.176
95% Confidence Interval
of the Difference
Lower
Upper
1.44
2.13
+ve difference means
Pre GHQ > FU GHQ
Antoinette M. Lee
T-value
Assessment and Testing
Master of Behavioral Health
t
10.163
df
289
Sig. (2-tailed)
.000
If p-value < .05,
difference significant
Repeated-measures ANOVA
• To test if the same sample is different on a measure
at 2 or more timepoints
•
Analyze  General Linear Model  Repeated measures  enter
“Within-subjects factor name” (e.g. time), then enter “number of levels”
(e.g. 3)  click “Add” then “Define”  choose within-subjects variables
(e.g. Pre HADS, Post HADS, FU HADS)  (Optional) choose betweensubjects factors (e.g. group)
 click “OK”
M a u ch l y 's T e st o f Sp h eri c i t by
Measure: MEASURE _1
Epsilona
Approx.
Greenhous
Within Subjects Effect Mauchly's W
Chi-Square
df
Sig.
e-Geisser
Huynh-Feldt
Lower-bound
TIME
.975
2.906
2
.234
.976
1.000
.500
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an
identity matrix.
a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of
Within-Subjects Effects table.
b.
Design: Intercept+GROUP
Within Subjects Design: TIME
If < .05, sphericity of the measure at different timepoints
Antoinette M. Lee
not equal
 statistical
Assessment
andadjustment
Testing (usually use
Greenhouse-Geisser, see next page)
Master of Behavioral Health
T es t s o f W i t h i n -S u b j ec t s E ffec t s
Measure: MEASURE_1
Look at this row if equal
sphericity assumed
Look at this row if equal
sphericity violated
Source
TIME
TIME * GROUP
Error(TIME)
Sphericity Assumed
Greenhouse-Geisser
Huynh-Feldt
Lower-bound
Sphericity Assumed
Greenhouse-Geisser
Huynh-Feldt
Lower-bound
Sphericity Assumed
Greenhouse-Geisser
Huynh-Feldt
Lower-bound
Type III Sum
of Squares
1.550
1.550
1.550
1.550
7.906
7.906
7.906
7.906
219.211
219.211
219.211
219.211
df
2
1.952
2.000
1.000
2
1.952
2.000
1.000
236
230.349
236.000
118.000
Mean Square
.775
.794
.775
1.550
3.953
4.050
3.953
7.906
.929
.952
.929
1.858
Antoinette M. Lee
df
1
1
118
Mean Square
3050.208
2.801
6.670
F
457.278
.420
Assessment and Testing
Master of Behavioral Health
.834
.834
.834
.834
4.256
4.256
4.256
4.256
Sig.
.435
.433
.435
.363
.015
.016
.015
.041
Time effect not significant,
but time x group interaction
significant
T es t s o f B e t wee n -Su b j e ct s E ffe c ts
Measure: MEASURE _1
Transformed Variable: Average
Type III Sum
Source
of Squares
Intercept
3050.208
GROUP
2.801
Error
787.102
F
Sig.
.000
.518
Group effect not significant
Estimated Marginal Means of MEASURE_1
5.4
5.3
5.2
5.1
5.0
4.9
Group
4.8
Intervention
4.7
4.6
Control
1
2
TIME
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
3
Correlation
• Measure of association between two
variables
• Whether variation in Variable X is
associated with variation in Variable Y
• Correlation coefficient
– Range from -1 to 1
– Direction and strength of association
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Correlation
1.50
 
 

 

 































  




 
   














15.00


 


   
10.00
Height
    
  


     

      

 

   

0.00
0.00
0.00
5.00
10.00
Depression
No correlation:
Depression vs Height
15.00

        
5.00
0.50
 
 
     

 
  
Anxiety
1.00

35.00

Self-esteem
 

40.00






 


 



















 

 
 
 

 
  
  


5.00
 

   
 

 

 

 
 

10.00
15.00
Depression
Positive correlation:
Depression vs Anxiety
 








20.00
0.00



    


 
25.00




30.00


0.00
5.00
10.00
15.00
Depression
Negative Correlation:
Depression vs Self-esteem
Correlation
• Issue of causation
– Correlation only implies association; it does
not necessarily imply causation
• If A and B are significantly correlated, there are
three possibilities:
– A causes B
– B causes A
– A third variable both A and B
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Correlation
•
Analyze  Correlate  Bivariate  choose variables to be analyzed
(e.g. age, perceived improvement, and compliance)  click “OK”
C o rrel a t i o n s
Age
Age
Pearson Correlation
1
Sig. (2-tailed)
.
N
365
Perceived improvement
Pearson Correlation
-.009
Sig. (2-tailed)
.870
N
342
Compliance
Pearson Correlation
-.008
Sig. (2-tailed)
.879
N
339
**. Correlation is significant at the 0.01 level (2-tailed).
Age is not correlated with
perceived improvement and
compliance
Antoinette M. Lee
Perceived
improvement
Compliance
-.009
-.008
.870
.879
342
339
1
.308**
.
.000
344
341
.308**
1
.000
.
341
341
Perceived improvement is
significantly correlated with
compliance, r = .308, p < .001
Assessment and Testing
Master of Behavioral Health
Regression
• To test if a certain number of variables can predict a
dependent variable
•
Analyze  Regression  Linear  choose “Dependent” (e.g.
perceived improvement)  choose “Independent(s)” (e.g. group, age,
sex, education, compliance, Pre GHQ, Pre CAS positive, Pre SF physical,
Pre SF mental, Pre user satisfaction)  choose “Method” (e.g. stepwise)
 click “Statistics” and check “R square changed”, then “Continue” 
click “OK”
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Vari a b l es E n t ere d / R em o v eda
Model
1
Variables
Entered
Variables
Removed
Method
Stepwise
(Criteria:
Probability
-of-F-to-en
Compliance
. ter <=
.050,
Probability
-of-F-to-re
move >=
.100).
2
Stepwise
(Criteria:
Variables entered into the
Probability
-of-F-to-en
prediction model
Pre User
Satisfaction
. ter <=
.050,
score
Probability
-of-F-to-re
move >=
.100).
3
Stepwise
(Criteria:
Probability
-of-F-to-en
Pre CAS
. ter <=
positive
.050,
Probability
-of-F-to-re and Testing
Assessment
move >=
Antoinette M. Lee
Master
of Behavioral Health
.100).
a. Dependent Variable: Perceived improvement
Total variance explained
by the model
With multiple variables in
the model, look at adjusted
R2 instead of R2
M o d e l Su m m a ry
Change Statistics
Adjusted R
Std. Error of
R Square
Model
R
R Square
Square
the Estimate
Change
1
.368 a
.135
.132
2.644
.135
b
2
.396
.157
.149
2.617
.021
3
.416 c
.173
.162
2.597
.016
a. Predictors: (Constant), Compliance
b. Predictors: (Constant), Compliance, Pre User Satisfaction score
c. Predictors: (Constant), Compliance, Pre User Satisfaction score, Pre CAS positive
F Change
35.705
5.729
4.414
Model 3 with Compliance, Pre User
Satisfaction, and Pre CAS positive as
predictors totally explains 16.2% of variance of
the dependent variable Perceived
Improvement
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
df1
df2
1
1
1
228
227
226
Sig. F Change
.000
.018
.037
Prediction model is
significant if p < .05
Coefficients of variables in
prediction equation
C o e ffi c i en tsa
Unstandardized
Coefficients
Model
B
Std. Error
1
(Constant)
3.404
.355
Compliance
.266
.045
2
(Constant)
.504
1.261
Compliance
.245
.045
Pre User Satisfaction score
.401
.168
3
(Constant)
-.979
1.437
Compliance
.248
.045
Pre User Satisfaction score
.355
.168
Pre CAS positive
.050
.024
a. Dependent Variable: Perceived improvement
Standardized
Coefficients
Beta
.368
.339
.149
.342
.131
.128
t
9.584
5.975
.400
5.451
2.393
-.681
5.546
2.112
2.101
Sig.
.000
.000
.690
.000
.018
.497
.000
.036
.037
Prediction equation:
Perceived improvement = 0.342*Compliance + 0.131*Pre User Satisfaction + 0.128*Pre CAS positive – 0.979
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
E x c l u d ed Va ri ab le sd
Partial
Beta In
t
Sig.
Correlation
Group
-.077a
-1.260
.209
-.083
Age
-.018a
-.290
.772
-.019
Sex
.076 a
1.225
.222
.081
a
Education
.088
1.427
.155
.094
Pre GHQ total
-.074a
-1.207
.229
-.080
Pre CAS positive
.145 a
2.383
.018
.156
a
Pre SF Physical
.035
.569
.570
.038
Pre SF Mental
.084 a
1.359
.175
.090
Pre User Satisfaction score
.149 a
2.393
.018
.157
b
2
Group
-.081
-1.327
.186
-.088
Age
-.039b
-.639
.524
-.042
b
Sex
.085
1.383
.168
.092
Education
.110 b
1.788
.075
.118
Pre GHQ total
-.072b
-1.180
.239
-.078
Pre CAS positive
.128 b
2.101
.037
.138
Pre SF Physical
.031 b
.515
.607
.034
Pre SF Mental
.069 b
1.116
.266
.074
c
3
Group
-.076
-1.255
.211
-.083
Age
-.030c
-.497
.620
-.033
Sex
.075 c
1.228
.221
.082
c
Education
.090
1.463
.145
.097
Pre GHQ total
-.016c
-.233
.816
-.016
Pre SF Physical
-.006c
-.098
.922
-.007
Pre SF Mental
-.012c
-.155
.877
-.010
a. Predictors in the Model: (Constant), Compliance
b. Predictors in the Model: (Constant), Compliance, Pre User Satisfaction score
and Testing
c. Predictors in the Model:Assessment
(Constant), Compliance, Pre User Satisfaction score, Pre CAS positive
Behavioral Health
d. Dependent Variable:Master
Perceivedof
improvement
Model
1
Variables removed,
i.e. not significant
predictors of the
dependent variable
Antoinette M. Lee
Collinearity
Statistics
Tolerance
1.000
1.000
.990
.998
1.000
1.000
.999
.987
.962
.999
.980
.987
.980
.999
.983
.999
.976
.998
.975
.980
.952
.781
.913
.631
Logistic regression
• To test if a certain number of variables can predict a
dichotomous dependent variable (e.g. have mental
disorder or not)
•
Analyze  Regression  Binary logistic  choose “Dependent” [e.g.
mental disorder (Yes: GHQ  5; No: GHQ < 5)  choose “Covariate(s)”
(e.g., age, sex, marital, education, perceived improvement, compliance,
Pre CAS positive, Pre CAS negative, Pre SF physical, Pre SF mental, Pre
user satisfaction)  choose “Method” (e.g. Forward: Conditional) 
click “Categorical” and choose the covariates that are categorical (e.g.,
sex, marital, education), then click “continue”  click “OK”
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Dummy variables created for categorical variables by the software
C at e g o ri c al Vari a b l e s C o d i n g s
Education
Marital
status
Sex
no formal education
Primary school
F.1 - F.3
F.4 - F.5
F.6 - F.7
Tertiary
single
cohabit/married
divorced/separated
widowed
male
female
Antoinette M. Lee
Frequency
3
19
24
76
23
75
72
135
8
5
61
159
(1)
1.000
.000
.000
.000
.000
.000
1.000
.000
.000
.000
1.000
.000
(2)
.000
1.000
.000
.000
.000
.000
.000
1.000
.000
.000
Parameter coding
(3)
.000
.000
1.000
.000
.000
.000
.000
.000
1.000
.000
Assessment and Testing
Master of Behavioral Health
(4)
.000
.000
.000
1.000
.000
.000
(5)
.000
.000
.000
.000
1.000
.000
•Block 1: Method = Forward Stepwise (Conditional)
Om n ib u s T e st s o f M o d e l Co effi ci e n t s
Step 1
Step 2
Step
Block
Model
Step
Block
Model
Chi-square
14.506
14.506
14.506
5.429
19.935
19.935
df
1
1
1
1
2
2
Test of goodness of fit of the model
Sig.
.000
.000
.000
.020
.000
.000
M o d el Su m m a ry
-2 Log
Cox & Snell
Nagelkerke
Step
likelihood
R Square
R Square
1
95.015a
.064
.163
b
2
89.586
.087
.221
a. Estimation terminated at iteration number 6 because
parameter estimates changed by less than .001.
b. Estimation terminated at iteration number 7 because
parameter estimates changed by less than .001.
Antoinette M. Lee
Large value means
worse prediction of the
Assessment and Testing
dependent variable.
Master of Behavioral Health
Therefore, model 2 is
better than model 1.
C l as si fi c at i o n
Comparison of
observed and predicted
values
T a b l ea
Predicted
Observed
Step 1 Mental disorder
GHQ >=5
Overall Percentage
Step 2 Mental disorder
GHQ >=5
Overall Percentage
a. The cut value is .500
no
yes
no
yes
Mental disorder GHQ >=5
no
yes
205
0
15
0
205
15
0
0
Percentage
Correct
100.0
.0
93.2
100.0
.0
93.2
If p < .05, parameter significant
Vari a b l es i n t h e E q u at i o n
B
S.E.
Wald
df
Sig.
Exp(B)
IMPROVE
-.431
.134
10.428
1
.001
.650
Constant
-.913
.474
3.700
1
.054
.401
Step 2b IMPROVE
-.413
.136
9.156
1
.002
.662
P_SFPHY
-.067
.029
5.185
1
.023
.935
Constant
1.743
1.208
2.081
1
.149
5.716
a. Variable(s) entered on step 1: IMPROVE.
b. Variable(s) entered on step 2: P_SFPHY.
Estimated parameters of the logistic regression equation
Step 1a
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
M o d el i f T erm R e m o v eda
Model Log
Change in -2
Variable
Likelihood
Log Likelihood
Step 1 IMPROVE
-56.445
17.876
Step 2 IMPROVE
-52.334
15.083
P_SFPHY
-47.649
5.711
a. Based on conditional parameter estimates
df
1
1
1
Sig. of the
Change
.000
.000
.017
p < .05 means addition of the variable into the
model improves the predictive power of the model
Prediction equation:
Logit(Y) = -0.413*IMPROVE – 0.067*P_SFPHY
P(Menter disorder) =
e(-0.413*IMPROVE – 0.067*P_SFPHY)
1 + e(-0.413*IMRPOVE – 0.067*P_SFPHY)
Antoinette M. Lee
Assessment and Testing
Where IMPROVE = Perceived improvement
Master of Behavioral Health
P_SFPHY = Pre SF physical health
Norms
• What does it mean to have a score of
70 on a test with a range from 0 to
100?
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Norms
• Hard to say as it depends on the
distribution of scores
• A matter of relative performance
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Test Standardization &
Norms
• Test standardization: the process of
administering the test to a large representative
sample and obtaining norms for the test
• Norms: performances by defined groups on
particular tests
• Ways to express norms: Z-scores, means, and
percentiles
• Provides information about performance
relative to what has been observed in a
standardization sample
Assessment and Testing
Antoinette M. Lee
Master of Behavioral Health
Z-Score
• A person’s score may be converted to a
standard score, known as the z-score,
for comparison to data of the general
population
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Z-Score
• The z-score represents the standard deviation
from the mean of a normal distribution
• Theoretically, z-scores range from negative
infinity to positive infinity
• However, over 99% of the data are represented
by z-scores of range –3.00 to 3.00
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Z-Score
• Allows for a meaningful comparison of
different data
• A z-score of 1 always means it is 1 standard
deviation (s.d) above the mean while a zscore of –1 means it is 1 s.d below the mean
• The greater the z-score (+ or -), the further it
is away from the mean
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Z-Score
• Calculating z-score:
z=X- M
S
where X= raw score
M= mean
S= standard deviation
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Z-Score
• Given a set of test scores with a mean of 100 and a s.d.
of 10, we can convert raw scores to z-scores by:
Test Score = 100
z= (100-100)/ 10 = 0
Test Score = 115
z= (115-100)/ 10 =1.5
Test Score = 80
z= (80-100)/ 10 = -2
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Z-Score
• Z-scores of –2 to 2 comprise of around 95 % of the
data
• Therefore, z scores can also give information about the
proportions of subjects included within a certain range
of z-scores
• A table listing all the proportions represented from
under the normal distribution can be found for a
specific z-score
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Percentile Ranks
• A percentile rank answers the question
“What percent of the scores fall below a
particular score (Xi)?
– If the percentile rank for a particular score
is 20, it means that 20% of the scores
obtained in the sample falls below this
particular score
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Percentile
• Percentiles are the specific scores or
points within a distribution
• Indicates the particular score, below
which a defined percentage of scores
fall
• 50th percentile = median
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Age-Related Norms
• Different normative groups for different
age groups
• E.g. Growth charts
– Norms of height and weight for children of
different ages
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health
Age-related Norms
• Concept of tracking
– The tendency to stay at about the same
level relative to those within the same age
group
– Staying at the same percentile rank over
time
– Height and weight are good examples of
physical characteristics that track
– Help determine whether a person is going
through an unusual growth pattern
Antoinette M. Lee
Assessment and Testing
Master of Behavioral Health