Chapter 5 Measurement and Sampling

Download Report

Transcript Chapter 5 Measurement and Sampling

Chapter 5
Measurement and Sampling
Psychological Concepts
• Measuring complex concepts
– Psychological concepts are often abstract
– We create operational definitions to measure
these complex, abstract concepts
– Our operational definitions have to make sense
for the research questions we want to answer
• Operational definitions
– Our measurements represent the concepts that
we cannot observe directly
Defining and Measuring Variables
• Operational definition—A working definition
of a concept that is based on how we measure
it
• Variable—An element that, when measured,
can take on different values (e.g., intelligence
test scores)
• Hypothetical construct—A concept that helps
us understand behavior but that is not directly
observable
Defining and Measuring Variables
• Example: One measure of stress?
• Score on the Social Readjustment Rating Scale
– 43 items on the scale
– Different number of stress units for different life
events; greater numbers of points reflect greater
stress (and likelihood of illness)
•
•
•
•
Death of a spouse: 119 points
Jail term: 79 points
Change in schools: 35 points
Christmas: 30 points
• Add points on the 43 items to get stress level
• The score on the scale represents the underlying
hypothetical construct of stress
Defining and Measuring Variables
• The importance of culture and context in defining
variables
– The Social Readjustment Rating Scale may not be
good for some populations, like young college
students
– A different scale includes 51 items, including
roommate problems, maintaining a steady dating
relationship, attending a football game
• It is critical to understand the people who you are
measuring
Multiple Possible Operational
Definitions
• You can measure constructs in a variety of
ways, depending on your research.
• Considering stress:
– Physiological measurements (cortisol level in the
bloodstream)
– Questionnaire scores
• You choose your operational definition
depending on the nature of your research
question and on practical issues
Probability Sampling
• Probability sampling—Set of sampling
methods in which every person in the
population has a specified probability of being
selected
• Generalization—Applying results of research
to an entire population
– Probability sampling permits researchers to
generalize from their sample to the population
because the means of selection does not bias
toward including or excluding people, so the
sample is likely to represent the entire population
Probability Sampling
Types of Probability Samples
Simple Random Sampling—Each person in the population has
the same chance of being included in the research sample
Systematic Sampling—Selection process involving an unbiased
approach that is not truly random (e.g., selecting every 10th
person on a list)
Stratified Random Sampling--Selecting samples in which
groups of interest (e.g., males and females) are identified and
selected so the groups are represented in a desired proportion
in the sample.
Cluster Sampling—Sampling in which a number of groups (i.e.,
clusters) are identified, and a certain number of clusters are
randomly selected for participation in the research
Nonprobability Sampling
• Nonprobability sampling—Sampling that relies
on groups of people who are convenient or
available to participate
• Nonsampling error—Problem with
nonprobability sampling in which some
members of the population are systematically
excluded from participation
• Problem with nonprobability sampling—It is
not clear to whom the results will generalize
because the sample is idiosyncratic
Nonprobability Sampling
Types of Nonprobability Sampling
Convenience Sampling —Nonrandom sampling involving whoever happens
to be available to participate in the research (also called haphazard or
accidental sampling)
–Most psychological research involves convenience sampling
–This type of sampling is easy and practical
–For some types of research, convenience samples are likely to mirror
the population, but the researcher has to use judgment in deciding this
Purposive (Judgmental) Sampling—A sampling method in which
participants are chosen because they possess some desirable trait (e.g., high
creativity)
Chain-referral Sampling—A sampling method in which the researcher
identifies a potential participants who, in turn, identifies another
participant, who then also identifies somebody else, and so on.
Making Useful Measurements
• Reliability—A characteristic of data related to
the consistency of a the measurement
• Validity—A characteristic of data related to
how useful a measurement is for the intended
purpose
• Measurement error—An error in
measurement due to poor measuring
instruments or humor error, which can lead to
poor conclusions
Making Useful Measurements
• The relation between reliability and validity
– Reliability simply means consistency but does not
indicate how useful the measurements are
– The validity level of measurements is limited by
how reliable they are (e.g., low reliability
guarantees low validity)
– If measurements are reliable, they might be valid
– If measurements are not reliable, they cannot be
valid
– If measurements are valid, they must be reliable
Making Useful Measurements
Types of Reliability
Test-retest reliability—Consistency of scores on
measurements taken at two different times
Split-half reliability—Consistency of scores on a
measurement device (e.g., a test) when the
scores are put in subgroups and the subgroups
are compared
Interrater reliability—Consistency of
measurements by different observers (also called
interobserver reliability)
Considering Validity in Research
Types of Validity
Construct validity—The degree to which a measurement gives a good
indication of the construct a researcher is trying to measure
Convergent validity—The extent to which two measurements that are
supposed to measure the same construct are correlated
Divergent validity—The extent to which two measurements that are
supposed to be unrelated are actually uncorrelated
Internal validity—The degree to which a research design leads to
conclusions in which a researcher has confidence (associated with
random assignment of participants in an experiment)
External validity—The degree to which a researcher can generalize the
results of a study to a larger population (associated with random
selection of participants from a population)
Statistical conclusion validity—The degree to which statistical analyses
lead to good conclusions
Construct Validity
• Construct validity: Is our measurement
appropriate for what we are trying to
measure?
• The Beck Depression Inventory has acceptable
construct validity for people from many
cultures
– Mexican
– Portuguese
– Arabic
– American
Construct Validity
• The Beck Depression Inventory does not have
good construct validity for
– Alzheimer’s patients
– Seriously depressed patients
– Some people with chronic disease
Construct Validity
• The Beck Depression Inventory (like all
measurements) may have good construct
validity in some situations but not in all.
Convergent Validity
• Measurements that correlate when they
should correlate
– Sometimes they should correlate positively
– Sometimes they should correlate negatively
Divergent Validity
• Measurements do not correlate (either
positively or negatively) when there is no
reason to expect that they should.
Internal Validity
Internal Validity: Can you identify the most likely
cause and rule out alternative explanations in
understanding behavior
– Random assignment of participants to
experimental groups increases the likelihood that
internal validity will be high
– Random assignment is useful is permitting
researchers to draw cause-and-effect conclusions
How to Randomly Assign
91477
09496
03549
19981
51444
66281
08461
36070
28751
64061
29697
48263
90503
55031
89292
05254
61412
12377
01486
22061
90242
22662
41995
34220
10273
35219
53378
52392
54443
10746
59885
34601
06394
48623
90035
96901
13522
67053
10873
84070
07389
56490
61978
53407
04758
38055
80778
49965
02586
71531
1. Go through a random number table and write down the numbers
from 1 to N (your sample size) in the order in which they occur in the
table.
2. Pair each person with the random numbers as they occur.
3. Put each person paired with an odd number into Group 1 and each
person paired with an even number into Group 2.
External Validity
• Are your measurements relevant for
– Other settings?
– Other people?
– Other times?
Statistical Conclusion Validity
Are your statistics appropriate to answer your
research question?
• Scales of measurement
– Some researchers place importance on scales of
measurement in determining statistical tests
– Some researchers think that scales of
measurement are generally unimportant
• There are controversies regarding the value of
null hypothesis statistical testing
The SAT: Questions of Reliability and
Validity
The SAT is fairly reliable, but is it valid?
• That is, people taking the SAT on different
occasions generally score about the same each
time
• Does your SAT score predict your grades in
college?
• In general, there is a reasonably high (but not
perfect) correlation between SAT scores and
college grades (r > .50), suggesting that it shows a
degree of construct validity
Controversy: The Head Start Program
How effective is the Head Start Program?
• How do you operationally define effective?
• Gains in IQ scores by children in Head Start
are not long lasting, so by this measure, Head
Start is not effective
• Head Start children show higher grades and
graduation rates than children who did not
participate in Head Start, so by this measure,
Head Start is effective
Controversy: The Head Start Program
• Most, but not all, studies show long-term
gains due to Head Start
• Complex questions like this are very difficult to
answer
• To reach sound conclusions, we have to
evaluate how adequate (i.e., valid) the
measurements are, weigh all the evidence,
and think critically about the issue
Scales of Measurement
Scales of Measurement
Nominal scales—Measurements that involve putting
observations into categories, without numerical values
(e.g., left-handed, right-handed, and ambidextrous)
Ordinal scales– Measurements that differentiate only by
identifying whether a measurement is larger or smaller
than another, resulting in ranked data.
Interval scales—Measurements that involve absolute
differences between scores.
Ratio scales—Measurements that involve absolute
differences between scores and also proportional
measurements (e.g., A is twice as big as B)
Scales of Measurement
• Some researchers believe that most statistical
approaches in psychology require interval or
ratio data
• There is sometimes controversy about what
scale applies to a given measurement.
Scales of Measurement
• Is IQ score nominal, ordinal, interval, or ratio?
– IQ is not simply a set of categories, so it is not nominal
– IQ scores let you say, “Person A has a higher score than Person
B” so IQ scores are at least ordinal.
– A difference between scores of 80 and 90 and between 130 and
140 represent equal differences in numerical values, so an IQ
score appears to be at least interval, but psychologically, is the
difference in function between people with scores of 80 and 90
equal to the difference between people with scores of 130 and
140? If not, is the scale really interval with respect to
psychological differences?
– If somebody has a score 10% higher than another person, it
does not mean that the person is 10% smarter, so the scale may
not be ratio.
Is IQ score nominal, ordinal, interval,
or ratio?
• Some people have argued that IQ scores are
really ordinal.
• Others argue that IQ scores are interval.
• In spite of the theoretical controversy,
researchers treat measures like IQ scores as
interval for purposes of data analysis.