SANDScat-1x - The Julia Group
Download
Report
Transcript SANDScat-1x - The Julia Group
Categorical Data Analysis: When
life fits in little boxes
AnnMaria DeMars, PhD.
What are we going to do today?
• Basic statistics
• Logistic regression
An actual example
Are you going to die soon?
Our data
Kaiser Permanente Study of the Oldest Old,
1971-1979 and 1980-1988: [California]
DEPENDENT VARIABLE:
Dthflag = 1 if Died during study period
0 if alive at end of study period
Our data
PREDICTOR VARIABLES:
nursehome = 0 if lived at home continuously
1 = admitted to nursing home at any
time
We all knew FREQ DID THIS
PROC FREQ DATA = dsname ;
TABLES varname1 * varname2 / chisq ;
…AND THAT WITH THIS YOU YOU GET:
– Chi-square value (several)
– Phi coefficient
– Fisher Exact test (where applicable)
Lets Start Simple: PROC FREQ
PROC FREQ DATA =in.old ;
TABLES dthflag ;
Not Too Interesting…
55.4% of our sample died.
…So lets Dig Deeper:
STACKED BAR CHART
Stacked Bar Chart with SAS Enterprise
Stacked Bar Chart with SAS Enterprise
Stacked Bar Chart with SAS Enterprise
STACKED BAR Figure 1
STACKED BAR Figure 2
The Syntax
PROC GCHART DATA=mydata.oldpeople;
VBAR gender / SUBGROUP=dthflag
TYPE= PCT INSIDE=PCT ;
Lets Keep Digging!
Association Measures
Enterprise Guide Method
Enterprise Guide Method
Enterprise Guide Method
Enterprise Guide Method
The Syntax
PROC FREQ DATA = mydata.oldpeople ;
TABLES dthflag*nursehome /
NOROW NOPERCENT NOCUM
CHISQ MEASURES ;
Nursing home placement by death
Conditional
probabilities
Being able to find SPSS in the start menu
does not qualify you to perform a
multinomial logistic regression
Chi-square results
The options & what they tell you
Chi-square results
Pearson
∑
(fo – fe)2
fe
Chi-square results
Chi-square results
What is Fisher’s exact test &
when do I get one?
“Well, you see, what you really need
to do to make this a valid statistical test is
to kill off a few more patients”
Fisher’s Exact Test: probability of a
table as unusual as the one that you
have obtained under the null
hypothesis of no relationship.
With 2 x 2 Tables it’s automatic
Recap: Fisher’s Exact Test
• Small sample size
OR
• Need exact probability
A bunch of things you may not
know Proc Freq Does
Computing odds ratios
Divide frequency row 1, column 1 by frequency in row 2 column 2
2,846/184 = 13.51 -- odds of a person who lived not being in a nursing
home versus being in a home.
Divide frequency in row 2, column 1 by frequency row 2, column 2
2,239/ 1,077 = 2.08
Divide first result by the second
13.51/ 2.08 = 6.49
Measures
Mantel-Haeszel chi-square
Tests ordinal relationship
Same as Pearson if only two categories
Ordinal relationship ?
Don’t just compare values
ER visits versus nursing home
More measures
Take-away
1. Different types of chi-square values, different
types of correlations and other tests like odds
ratios do exist.
2. These statistics are very easy to obtain using
SAS.
3. While most times, all of these measures will
point you in the direction of the same
general conclusion, there are times when one
is preferable to the others.
Take-away 2
• Non-standard hypotheses call for nonstandard statistics
What about this ?
• PROC FREQ DATA = dsname ;
TABLES varname /
BINOMIAL (EXACT EQUIV P = .333)
ALPHA = .05 ;
What’s it Do?
• The binomial (equiv p = .333) will produce a
test that the population proportion is .333 for
the first category. That is “No” for death. A Zvalue will be produced and probabilities for
one-tail and two-tailed tests.
• The exact keyword will produce confidence
intervals and, since I have specified alpha =
.05, these will be the 95% confidence
intervals.
Again, Not New
Hmmm…. This is interesting
Null rejected !