Week 5 Measurement, Causation and Data Analysis
Download
Report
Transcript Week 5 Measurement, Causation and Data Analysis
Week 5
Measurement, Sampling and
Data Analysis
Measurement – Chapter 4
• If it exists, it is measurable!
• Measurement is used to gain
mathematical insight into our data
• Measurement is a comparison – We
compare our data to a standard such as
the norm, average or expected outcome
• Measurement is a standard used for
evaluation
• Measurement is an essential component
of quantitative research
• Through measurement we can inspect,
analyze and interpret our information
The Language of Variables
• A variable is any observation that can take
different values
• Gender, Age, Religion, Ethnicity are
variables
• Attributes are specific values on a variable
• Attributes of Gender = 1. Male 2. Female
Are discreet values
• Age (may be continuous 0-100)
• An indicator is the responses to a single
question – The main concept in the question is
the variable being measured
• Concept – A mental image that summarizes a
set of similar observations, feelings or ideas
We may not all agree on the same definition of a
concept – Concepts are abstracts
• Conceptualization – specifying the dimensions
of and defining the meaning of the concept
• We often use concepts in our theories
i.e. crime, abuse, deterrence, eating
disorder
What do we mean by crime? How is it
measured.
How can we measure our concepts
for research?
• Devise operations that actually measure
the concepts we intend to measure
• Operationalization of concepts – connect
concepts to observations by identifying
specific observations that we will use to
indicate that concept in empirical reality.
The process of choosing the variable to
represent the concept
• We use variables which are derived from
concepts in our hypotheses
• The variables move the concepts into the
realm of testability – i.e. The indicator of
crime is spousal abuse
• Through the indicators of a variable we are
able to ascertain the characteristics,
behaviors, attitudes of our subjects
How it Works
• Define the concept
We have a theory that Punishment deters
criminal behavior
• Choose and indicator of the concept to
represent it - The concept, punishment, is
operationalized by using the variable,
arrest to represent it. The indicators of the
variable, arrest are 1. arrest on first
offense 2. Don’t arrest on first offense
• The concept crime is represented in our
research as physical spousal abuse.
• The resulting hypothesis is:
If subjects are arrested on the first reported
offense of physical spousal abuse then they will
be less likely to offend again (recidivism)
The indicators of the variable, recidivism are 1. reoffend 2. doesn’t re-offend
Our concepts are now defined in measurable,
testable terms
• Through conceptualization and
operationalization, measurement
becomes the process of linking abstract
concepts to empirical indicants
• Measurement validity – The operations we
devise to measure our data must assure
that we measure the variables we
intended to measure
• For the concept social class, we can use the
variables: income, education and occupation.
• Now we get a much clearer picture of what
indicators are necessary to measure the abstract
concept, social class
• Income can be measured by actual income
• Education can be measured by years of
education
• Occupation can be measured by levels from not
at all professional to being very professional
• Measuring social variables is often done through
questions posed to people
• A single question may not be adequate for measuring a
concept. Multiple questions may be necessary
• The concepts age, gender, ethnicity, religion, income,
education, occupation, are what is called demographic
variables. These can be measured with one question.
What is your age?
• But more than one question is necessary to measure
Social Class, ADD, Prejudice, Nurture
• May need to construct an Index of question
Levels of Measurement
• Levels of measurement have important
implications for the type of statistics that
can be used in analyzing the data for a
variable
• 4 Levels of measurement – Nominal,
Ordinal, Interval and Ratio
• These levels are determined by the
indicators (response/answer categories)
for a variable
• Nominal – qualitative, has no mathematical
interpretation, even if numbers are attached to
the value label
These are called categorical variables
For example, we may ask What is your gender?
And the answer categories are 1. male 2.
female. However, the numbers 1 & 2 do not
indicate anything mathematical about the
differences in the answers. Female is not more
or higher of gender than Male.
Quantitative Levels of
Measurement
• Ordinal – the numbers assigned to the
response categories indicate order. 1 is
lower in order than 2 and 2 is lower in
order than 3.
• 1. Very Unimportant is lower in order than
2. Unimportant and 2. is lower in order
than 3. Important
• Interval – The numbers indicating values in the
response categories have mathematical
meaning.
• They represent fixed measurement units, but
have no absolute or fixed zero point.
• This is important mathematically because having
a fixed zero point allows us to use the highest
level of statistics
• Often researchers try to use ordinal level
variables as interval (i.e. Likert Scales)
• In interval level variables the numbers can be
added and subtracted but ratios are not
meaningful
• Fahrenheit Temperature is an interval level
variable. 60 degrees is 30 degrees hotter than
30 degrees. But, 60 degrees can not be said to
be twice as hot as 30 degrees because
temperature has no absolute zero.
• There are very few true interval-level measures
in social science. This is why researchers use
ordinal level data as interval level data and
score it in ways that allow them to do so
• Ratio – The numbers attached to these
response categories represent fixed
measuring units and an absolute zero
point. Age is a ratio level variable. Test
Scores can be ratio. i.e. 0-100
Sampling
How to choose survey subjects?
• Sample – A subset of people (population)
selected for study i.e. 100 students from
Webster selected
• Population – larger group from which
sample comes (will infer back to this
group) i.e. All Webster students participate
Why Sample
• If can’t access entire population (too costly, too
huge)
Sampling Goal
• Representativeness – smaller group (sample) is
representative of larger group (population)
• Larger the sample, more confidence in it being
representative
• More homogeneous the population, more
confidence of sample representativeness
• If sample is representative, findings can be
generalized to population. You can infer that
your sample will respond in same way as whole
population
• But, generalizing from sample to population
involves risk
• Ecological Fallacy – can’t draw conclusions
about individuals from group level sample of
data
• Reductionist Fallacy – can’t draw conclusions
about groups from individual level sample of
data
Generalizability
• Not easy to achieve in experiment
• Can’t really apply findings to larger
population
Experiments occur in artificial setting
Subjects recruited or selected, not chosen
through random sampling
Types of Sampling Procedures
• Probability – Random Sampling selects
subjects out of a large population on
the basis of chance
( a technique used most effectively with
survey research)
Probability Sampling
• Participants drawn by chance (random)
• Every subject has equal chance of being chosen
(known probability, 1:10; 1:100)
How to do Simple Random Sample
1. Arbitrarily select a number from a random
number table
2. Match it to number in numbered subject list for
starting point
3. Continue selecting numbers and subjects
Until desired number of subjects is obtained
Systematic Random Sample
• Arrange population elements sequentially
• Determine size of sample wanted
• Divide sample # into # of subjects in
population
• Randomly select a starting point in list
• Select every nth subject
• If need 5 subjects and have 45 in
population, select every 9th person
Stratified Random Sampling
• Characteristics of population are known to
the researcher before taking the sample
• Sample is selected with mirror proportions
on characteristics such as: ethnic, age,
gender, religion,education level, income
level etc.
Cluster Sampling
• Unit chosen is not an individual, but is a
cluster of individuals naturally grouped
together such as Churches, Schools,
Blocks, Counties, Businesses etc.
• They are alike with respect to
characteristics relevant to the study
Non-Probability Sampling-
• Participants are not chosen by chance
They are Chosen due to economical and
convenience reasons
• Example: Study on student attitudes. Stop
students at the gym only and ask them to
take the survey. They are not necessarily
representative of the total student
population
Types of Non-Probability Sampling
• Accidental – just encounter a # of people and
ask to be in your study
It is extremely weak, but popular method
Psychological research is often accidental
• Convenience – Similar to accidental Individuals
seek out individuals who are available
Likely to be biased
Not representative of any population
Should be avoided
• Snowball - used for hard to reach but
interconnected populations
• One person identifies and recommends
another people and those people
recommend other people and on and
on….
• Typical subjects – drug dealers,
prostitutes, practicing criminals, gang
leaders, AA members
Data Analysis – Chapter 12
Why are Statistics Important
• Statistics give numeric meaning to our data
• Helpful tool for understanding social world and
are used to:
1.describe social phenomena
2. identify relationships among them
3. explore reasons for relationships
4. test hypotheses
5. interpret cause and effect
Drawbacks
•
•
•
•
Can use statistics to distort reality
Lying with statistics is unethical
Easy to be careless when using statistics
Must use appropriate level of
measurement for variables in our data
Preparing for Statistics
• After data is collected it must b cleaned,
checked and coded before statistics are
run
• There is software available to do this
Displaying Statistics
• Graphics: Bar Charts, histograms, pie
charts, frequency tables and curve graphs
describe the shape of the data visually
Statistics for One Variable
• Univariate - describes statistical
characteristics of one variable: frequency
distributions, summary statistics,
measures of central tendency (mean,
median mode), skewness, measures of
dispersion (range, variance, standard
deviation), reliability tests
• Display the distribution of cases across the
categories of one variable
Univariate Stats
• Frequency distribution (1xtables) –
displays the number and percentage or
cases corresponding to each of a
variable’s values or group of values
• Measures of Central Tendency –
1. Mean (arithmetic average of the values
in a distribution) – sum the values of the
cases and divide by the number of cases
2. Median (the point that divides the
distribution in half) One in the middle
3. Mode (most frequent value in a
distribution)
• The Mean is the most frequently used
because it is the foundation for more
advanced statistics
• Skewness – If there is a lack of symmetry
in the data (symmetric would be Bell
curve)
• If data clustered to right of center- Positive
skew
• If data clustered to left of center –
Negative or inverse skew
• Measures of Variation or Dispersion –
Are the data spread out or clustered?
1. Range- highest value minus the lowest
value plus one
3. Variance – the average squared deviation
of each case from the mean (takes into
account the amount by which each case
differs from the mean)
4. Standard Deviation – Preferred
measure of variability because of its
mathematical properties ( sq. root of the
variance)
Bivariate/multivariate Analysis
• Describes the association between two or more
variables
• Some types: Cross-tabulation, Regression,
Correlation
• Measures of Association- descriptive statistics
that summarized the strength of an association
(Variation in one variable is related to variation in
another.
For example Chi Sq. and Gamma are used to
summarize the relationship between two or more
variables in Cross-tabulation
CROSS-TABULATION
• The tables display the
distribution of one
variable for each
category of another
variable (see text pgs.
392-398)
• Sex of voter
determines party.
• If Man then
Republican
Rep
Dem
M
E 80
N
20
W
O
M 30
E
N
70
What to Look for
The IV is Gender
Do percentages
distributions vary at
all between
categories of the
independent variable?
(existence)
How much? (strength)
(This example is
nominal level data)
Rep
Dem
M
E 80
N
20
W
O
M 30
E
N
70
Interval Level Data
• Hypothesis - As education
level (IV) increases, income
level (DV) increases
• Total N=300
• 100 with BA’s
• 100 with MA’s
• 100 with PhD’s
• Do values of the DV increase
with increase in IV? (Direction)
• Are changes in DV fairly
regular – increasing fairly
regularly? (pattern)
BA
MA
PhD
60
20
5
$50- 30
$100
50
30
GT 10
$100
30
65
LT
$50
Inferential Statistics
• They estimate the degree of confidence
that can be placed in generalizations from
a sample to the whole population from
which the sample was selected
• Chi-Square – used in bivariate analysis to
estimate probability that an association
between DV & IV is not due to chance
alone.
• A probability level of .05 (p=.05) from Chi Sq.
means the probability that the association was
due to chance is less than 5 out of 100 (5%)
• The lower the probability score the higher the
significance level.
• A relationship between variables is said to be
statistically significant when the analyst feels
reasonably confident (often 95%) that an
association was not due to chance.
• Inferential statistics with Crosstabulation
can tell us if there is an association more
than would be expected by chance (coin
toss = 50/50)
• But! Does not tell us how strong that
relationship is (See pgs. 405-407)
Elaboration Analysis
• Controlling for the effect of a third variable
• Sometimes a 3rd variable could be
effecting the association or strength of the
association without us realizing it.
• Example in Text – The strength of the
relationship between Arrest and Abuse is
actually dependent on how much the
perpetrator is vested in society. i.e.
employed or not and married or not.
• In fact, if the seemed relationship disappears
when an extraneous (3rd variable) is controlled, it
is probably a spurious relationship. The IV we
think is effecting the DV isn’t – It’s an extraneous
variable we haven’t considered.
• We hypothesize that Income level (IV) effects
how we vote (DV)
• In reality, income is a reflection of education (IV)
and it’s education that really effects how we vote
(DV).
Regression Analysis
• Regression analysis and Correlation analysisadvantages over simple crosstabs – give
strength of association between two or more
variables
• Often collapse values of variables into
categories for crosstabs
• Better to leave values as continuous for upper
level summary stats
• Example – Age 10-20 21-30 31-40 (grouped or
categorical age)
Ethics in Data Analysis
l. When just letting computer search around in the
data for relationships without a testable
hypothesis, relationships may appear just on the
basis of chance but mean nothing.
• A reasonable balance needed between doing
deductive data analysis
(theory>hypothesis>significant association)
• And inductive data analysis (exploration of
patterns in a dataset)
• If findings are Serendipitous (based on inductive
analysis) must be reported as such
2. Report findings honestly (do not lie with
statistics even though it is possible to do
so)
3. Do not mislead people by choosing
summary statistics that accentuate a
particular feature of a distribution. Use
statistical techniques appropriately
Tools for Data Analysis and
Statistics
Computer software ranges from easy, but not very comprehensive
to difficult, very robust and very expensive
• Excel, Access, Lotus – limited, elementary statistics,
moderately expensive
• Easy – NCSS= user friendly, cheap (<$100), only numeric data
entry, output=so-so
• SPSS – Very comprehensive, user friendly,excellent graphics,
small learning curve
Not very expensive for students ($200-$500)
• SAS – most robust, sort of user friendly, big learning curve,
$$$$$$
• CRISP, STATBasic, SYSstat – expensive, not user friendly, More
for programmers than average user