Data Distributions:

Download Report

Transcript Data Distributions:

I. ANOVA revisited & reviewed
A. What is ANOVA?
B. When is it used (appropriately)?
•
•
Comparison test: numeric dependent variable &
categoric independent variable
Assumptions?
C. How do we do it?
•
•
•
Do Omnibus F test: Table of Descriptives + ANOVA
Do post hoc comparisons
Checks on assumptions?  Levene’s test of equal
variances vs. “rule of thumb”
D. How do we interpret it?
•
•
Omnibus F test
Post Hoc Comparisons
II. ANOVA alternatives:
A. What if we violate assumptions for ANOVA?
B. Alternatives to conventional ANOVA:
1. Modifications/Transformations of data
2. Nonparametric tests – when assumptions are
seriously violated (e.g. ordinal dependent var.)
 rely on ordinal dependent variables & not assume
normal (or parametric) distribution of data values
 Use only ordinal (rank order) information
 There are Nonparametric analogs for all the t-tests
and F-tests used with numeric variables (e.g., the
Median test described in the text)
o They are generally less “powerful”
o They are seldom used
III. Contingency Tables for Analysis
A. If both variables = categoric (nominal or
ordinal-with-few-levels), then:
1. Use Cross-tabulations (instead of means
comparisons) (aka “Contingency tables”)
2. Use Chi-Square statistical test (instead of z, t, or
F tests) denoted by Greek symbol Χ2
B. Cross-tabs revisited (from 2nd week of class)
a)
b)
c)
d)
What does a cross-tabulation show?
When to use a cross-tabulation?
How to set up a cross-tab?
How to test for a “significant difference” or “significant
contingency”?
III. Contingency Tables for Analysis
B. Cross-tabs revisited
a) What does a cross-tabulation show?
 Conditional distribution of one variable
across categories or levels of another
 The unconditional (non-contingent)
distribution reported in the “margins” of the
cross-tabulation
- these are called the “marginal
frequencies”
- They represent the distribution of each
variable while ignoring the other variable
Cross-Tabulation: Murder Weapon by Sex of Offender
Sex of Offender
Weapon Used
Male
Female
Gun
100
20
120
(67%)
(40%)
(60%)
39
21
60
(26%)
(42%)
(30%)
11
9
20
(7%)
(18%)
(10%)
150
50
200
Knife
Other Object
Totals
Totals
III. Contingency Tables for Analysis
B. Cross-tabs revisited
b) When to use a cross-tabulation?
• When both variables are categorical (or
discrete)
• And the number of categories or levels in
each is fairly small (< 5) (< 10)
c) How to set up a cross-tab?
• Rows versus Columns – which variable
goes where?
• Percentaging the table – which way?
 Percentage in direction of Indep Variable
C. Chi-Square Test
1. Statistical Test for Cross-Tabulations 
Chi-Square Test of Independence
a) What does it mean or represent?
• Chi-square statistic with known probability
distribution (& a single df parameter)
• Sum of squared deviations-of-observedfrom-expected fit a Chi-square distribution.
• Compute deviations of observed values
from values predicted under Null H.
• See if the observed pattern is likely to occur
by random sampling error (if Null H. is true)
 ( fo  fe ) 

  
fe

1 
k
2
2
where k = number of cells in the table = ( #rows) * (#columns)
and degrees of freedom = (#rows -1) * (#columns - 1)
C. Chi-Square Test
1. Chi-Square test of Independence (cont.)
b) How to do it? (by hand)
(1) Compute expected frequencies (under
independence)
(2) Compute squared deviations of observed
frequencies from expected frequencies
(skip this step if using computing formula)
(3) Compute Chi-square statistic & probability
level (for the degrees of freedom of the
table)
Simpler Computing Formula for Chi-Square:
Original Formula:
 ( fo  fe ) 

  
fe

1 
2
k
2


f
o
2

  n
  
1  fe 
k
Computing Formula:
where k = number of cells in the table = ( #rows) * (#columns)
and degrees of freedom = (#rows -1) * (#columns - 1)
2
C. Chi-Square Test
1. Chi-Square test of Independence (cont.)
c) How to compute it in SPSS?
•
•
•
•
Select Cross-tabs procedure from Analyze drop-down menu
Select the variables for Rows and Columns of the table
Use Cells options to select how to percentage the table
Use Statistics options to select Chi-Squares statistic
d) How to read the output?
•
•
•
•
Use Pearson Chi Square for most cross-tabs (larger than 2x2)
Use Continuity (Yates) Correction for small 2x2 tables(??)
Use Fisher’s Exact Test for 2x2 tables (recommended)
If statistically significant, visually compare the percentages in
the cells (across values of the Independent variable)
e) How to report the results?
(1) Report the cross-tabulation (frequencies & percents)
(2) Report the relevant Chi-square value and associated p-value
C. Chi-Square Test
1. Chi-Square test of Independence (cont.)
f) How to interpret it?
•
•
As Test of “no association” or “no relation” between variables
Strongly influenced by:
(1) Sample size
(2) Degrees of freedom (i.e., the size of table)
g) Assumptions?
•
•
•
Variables are independent
Data are randomly sampled
Moderately large number of cases
h) Limitations?
•
•
Not as valid for small tables with small numbers of cases
 Use Yate’s Correction for hand-calculated 2x2 tables (??)
 Use Fisher’s Exact test in SPSS for 2x2 tables (better)
Less valid when expected cell sizes are < 5
C. Chi-Square Test
2. Chi-Square “Goodness-of-fit” test = an oftenused alternative procedure (with a reversed logic)
a) Used when the Null-hypothesis is actually the
hypothesis of interest (not a “statistical straw man”)
•
•
Test to see if assuming the null is reasonable
Goal is to affirm or accept the null hypothesis with a degree of
statistical certainty
b) Widely used but statistically “inchoate”
•
•
•
•
Chances of a decision error when accepting the null-H are
unknown and incalculable as it is generally used
Uses ordinary null-hypothesis testing procedures but “turns
the decision inside out” ( makes an unfounded decision)
Decision is highly sensitive to sample size variations
It is statistically sound only when Type 2 errors and “statistical
power” are explicitly included in the decision