Transcript Reliability

Discussion Overview:
Measurement

I) Reliability of Measures

II) Construct Validity

III) Measurement scales
I) Reliability of Measures

Reliability
– The consistency or stability of a measure

Assessing a restaurant’s food

Three important variables
– How many testers? (Observers)
 Interrater reliability
– How many different entrees? (Observations)
 Internal consistency
– How many times? (Occasions)
 Test-retest
Interrater Reliability

The degree to which
independent raters agree
on an observation

Have two (or more) judges
rate the same people

Trained and independent
raters, using a coding
scheme
Interrater Reliability
Observer 1
Observer 2
Complain about injection
-2
3
First negative comment
0
1
Second negative
comment
-2
2
Rip up questionnaire
-2
3
Interrater Reliability
Observer 1
Observer 2
Complain about injection
2
2
First negative comment
0
0
Second negative
comment
-2
-2
Rip up questionnaire
2
3
Internal Consistency

Internal consistency – the degree to which
all specific items of a measure behave the
same way

Measure the same people with multiple
items
– Different questions in a survey
– Different behaviors in observation
Extraversion
1
Not at
all true
2
3
1. I am outgoing. ____
2. I am friendly. ____
3. I am talkative. ____
4. I am gregarious.____
4
5
Very
true
Internal consistency

Split-half reliability – correlation of scores
on one half of the test with scores on the
other half

Cronbach’s alpha – the average of all
possible correlations between items
‘One of these things just
doesn’t belong’

One of these things is not like the others,
One of these things just doesn't belong
Student 1
Student 2
Student 3
Ques 1
(Chpt 12)
Ques 2
(Chpt 12)
Ques 3
(Chpt 3)
10
2
9
9
3
8
2
6
1
Ques 4
(Chpt 12)
10
2
9
Test-Retest Reliability

The degree to which a measure correlates positively
with itself over time
– Consistency of the measure over time

Measure the same people at two (or more) points in
time

Desirable for stable traits, but not for transient states
The “More is Better Rule”

Reliability is likely to increase as we
increase the number of…
– Observers (or raters)
– Observations (or items)
– Occasions

Measurement error will average out
II) Construct Validity

How well an operational
definition represents the
construct of interest

The degree to which the
construct can be inferred
from the operational
definition of that
construct
Indicators of Construct Validity
Face validity
 Criterion validity

–
–
–
–
Predictive validity
Concurrent validity
Convergent validity
Discriminant validity
Face Validity

Face validity – Does the measure appear
to measure the construct of interest?
– Does the measure “on the face of it” look like
what it’s supposed to measure?

Not necessary or sufficient for a good
measure
Predictive Validity

Predictive validity – Is the measure
associated with variables it should
theoretically predict?
LSAT – Law school performance
 Self-esteem – Depression
 Shyness – Social anxiety

Concurrent Validity

Concurrent validity – Does the measure
differ between groups it ought to differ
between?
– Also called “known groups validity”

E.g., clinically depressed versus nondepressed groups
Convergent Validity

Convergent validity – Is the measure
associated with other established
measures of the same construct?
Self-report - Observations
 Physiological measure - Self-report
 Self-report 1 – Self-report 2

Discriminant Validity

Discriminant validity – Is the measure
NOT associated with measures of
other constructs?
Self-esteem scores not associated with
locus of control scores
 Problem solving knowledge not
associated with factual knowledge

Measurement Reliability & Validity
Reliability: Is the measure consistent?
 Validity: Does the measure adequately
reflect the construct of interest?

Reliable and Valid
Reliable, not Valid
Not Reliable, not Valid
Relationship between Reliability
and Validity
Can be reliable but not valid
 To be valid it must be reliable

– But reliability is not sole condition for validity

Both reliability and validity are necessary
for accurate measurement in a research
study.
Measurement Scales
Nominal scales
 Ordinal scales
 Interval scales
 Ratio scales

Nominal Scales
AKA Categorical scales
 No numerical/quantitative properties.
Categories or group simply differ from one
another
 Examples:

–
–
–
–
–
Men or women
Right or left handed
Catholic, Protestant, Jewish, Hindu, Buddhist…
Numbers on basketball jerseys
Zip codes
Ordinal Scales
Allow us to rank order the levels of the
variables being studied
 Examples

– Social class
 lower class, working class, middle class, and upper
class
– College football standings
– Letterman’s Top Ten
Top Ten Bush Goals For His Second Term
10. Fewer idiotic remarks; more hilarious pratfalls.
9. Add mother Barbara to Mount Rushmore.
8. Combine Nebraska and Kansas into new state: Nebransas.
7. Spice up boring state dinners with tasty fish sticks!
6. Improve communication skills from poor to fair.
5. Catch up on his "Smokey And The Bandit" collection.
4. Get Ray Stevens to write some funny lyrics for "Hail To The
Chief"
3. Ride every roller coaster in the country.
2. Install remote-activated button in Oval Office so he can blow stuff
up right from his desk!
1. Begin vote-rigging process for Jeb's White House run in 2008.
Interval Scales
The difference between the numbers on
the scale is meaningful
 Scores separated by equal intervals
 Examples

– Temperature (Fahrenheit or Celsius)
– Scores on personality measure
Ratio Scales

Scores separated by equal
intervals and there is an
absolute zero

Examples
–
–
–
–
Length
Weight
Time
Number of responses
Scales of Measurement
Qualitative
Info
Level
Has
inherent
order
‘more to
less’
Equal
Intervals
Nominal
X
Ordinal
X
X
Interval
X
X
X
Ratio
X
X
X
Has zero
point
X
Concept Check

Which scale of measurement best describes
the following:
–
–
–
–
Telephone numbers
Distances from Budapest to cities in the US
Scores on an extraversion personality assessment
Ranking of basketball teams in the Big Ten