crosstabs or sometimes crossbreaks.

Download Report

Transcript crosstabs or sometimes crossbreaks.

Chapter 10
The Analysis of Frequencies
The Analysis of Frequencies
• The expression “cross partition” refers to an
abstract process of set theory.
• When the cross partition idea is applied to
the analysis of frequencies to study
relations between variables, we call the
cross partitions crosstabs or sometimes
crossbreaks. This kind of analysis to be
shown is called contingency analysis, or
contingency table analysis.
• Table 10.1, 10.2
Data and Variable Terminology
• A distinction was made between active and
attribute variable, the former meaning an
experimental or manipulated variable and
the latter a measured variables.
• Remember that a attribute variable is not
always a categorical variable or qualitative
variable.
• An attribute is any property of any object,
whether the object is measured in an all-or
none way or with a set of continuous
measures.
Crosstabs: Definitions and Purpose
• A crosstab is a numerical tabular
presentation of data, usually in frequency
or percentage form, in which variables are
cross partitioned.
• Crosstabs enable the researcher to
determine the nature of the relations
between variables, but also have other
side purposes: They can be used to
organize data in convenient form for
statisitcal analysis.
Simple Crosstabs and Rules for
Crosstab Construction
• The rules are (1) categories are set up
according to the research hypotheses; (2)
categories are independent and mutually
exclusive; (3) categories are exhaustive; (4)
each category is derived from one and only
one classification principle; (5) all categories
are on one level of discourse.
• In general, we will report the levels of the
independent variable in column and the
outcome responses of the dependent
variable as rows in the contingency table.
Calculation of Percentages
• Percentages are calculated from the
independent variable to the dependent
variable.
• Figure 10.3. Table 10.3 (p.225)
• Use percentages but not frequencies to
highlight the relation of variables in
crosstabs. Why? A common base.
Calculation of Percentages
• In Table 10.3. The Payette-Clarizio
problem is pointed toward the
misclassification of children as eligible or
ineligible for learning disability treatment.
• A hypothesis implied by the problem is:
Decisionmakers are biased in their
decisions about female children. This is a
statement of the “If p, then q” kind: If
female, then they are most likely eligible
for learning disable considerations.
Calculation of Percentages
• Why not calculate the percentages the
other way: from the dependent variable to
the independent variable?
• Why not calculate the percentages over
the whole table?
Statistical Significance and the Chisquare Test
• Look at Table 10.3. Do they really express
a relation between gender and learning
disability eligibility? Or could they have
happened by chance?
• Are they one pattern among many
patterns of frequencies that one would get
picking numbers from a table of random
numbers, such selection being limited only
by the given marginal frequencies?
Statistical Significance and the Chisquare Test
• We may say here that “degrees of
freedom” defines the latitude of variation
continued in a statistical problem. In the
problem above, there is one degree of
freedom because the total number of
cases is fixed, 100, and because as soon
as one of the frequencies is given, the
other is immediately determined.
Levels of Statistical Significance
• The 0.05 level means that an obtained
result that is significant at the 0.05 level
could occur by chance no more than five
times in 100 trials.
• The 0.05 level was originally chosen—and
has persisted with researchers—because
it is considered a reasonably good gamble.
It is neither too high nor too low for most
social scientific research.
Levels of Statistical Significance
• There is a newer trend of thinking that advocates
reporting the significance levels of all results.
• Another school of thought advocates working with
what are called “confidence intervals.”
• Rozeboom (1960) advocates the use of confidence
intervals and the reporting of precise probability
values of experimental outcomes. However, Brady
(1988) states that such precision is generally
meaningless in the social and behavioral sciences
because of the inaccuracy of measurements.
Levels of Statistical Significance
• A statistically significant result does not
imply personal or practical significance.
• It is Cramer’s V, a measure of association
based on the chi-square value. The
formula is:
V
2
N (k  1)
Types of Crosstabs and Tables
• There are three types of tables: onedimensional, two-dimensional, and kdimensional.
• Theoretically, there is no limit to the
number of variables that can be
considered at one time. The only
limitations are practical ones: insufficient
sample size and difficulty of
comprehension of the relations contained
in a multidimensional table.
One-dimensional Tables
• There are two kinds of one-dimensional
tables. One is a “true” one-dimensional
table; it is of little interest to us because it
does not express a relation. Only one
variable is used in the table.
• Social scientists sometimes choose to
report their data in tables that look onedimensional but are really two-dimensional,
such as Table 10.7.
Two-dimensional Tables
• Two-dimensional tables or crosstabs have
two variables, each with two or more
subclasses.
• Table 10.8, 10.9, 10.10, 10.11.
Three- and k-Dimensional Tables
• The analysis of three or more variables
simultaneously has two main purposes.
First, is to study the relations among three
or more variables. The second purpose is
to control one variable while studying the
relation between the other two variables.
Specification
• Specification is a process of describing the
conditions under which a relation does or
does not exist, or exists to a greater or a
lesser extent.
• In the above analysis (Table 10.13, 10.14),
the data were specified: it was shown, by
introducing the social-class variable, that the
relation between level of aspiration and
success in college was stronger in one
group (middle class) than in another group
(working class).
Specification
• This is similar to the phenomenon of
interaction discussed in chapter 9. Strictly
speaking, “interaction” is a term used in
experimental research and analysis of
variance. The position taken in this book is
that interaction is a general phenomenon
of great importance occurring in both
experimental and nonexperimental
research.
Crosstabs, Relations, and Ordered
Pairs
• A relation is a set of ordered pairs.
• Table 10.15, 10.16, Figure 10.4.
The Odds Ratio
• Odds are computed as the ratio of the
probability that the event will occur to the
probability that it will not occur.
• Odds ratio
• The chi-square statistic is still the
preferred method; however, it is unable to
give the type of information that odds
ratios can give.
Multivariate Analysis of Frequency
Data
• Many frequency data analysis, however,
are of three and more variables. It is socalled “multi-way contingency tables with
frequency data,” which can be handled by
log-linear model.
Computer Addendum
• Figure 10.10~10.13