Chi Sq Independence

Download Report

Transcript Chi Sq Independence

Chi-Square Analysis
Test of Independence
We will now apply the principles of Chi-Square analysis to
determine if two variables are independent of one another.
We will use as an example a study at the University of Texas
Southwestern Medical Center. They examined the incidence of
hepatitis C and the occurrence of tattoos on the patients. Patients
were selected from those seeking medical attention for unrelated
disorders.
In the US each year about 10,000 people die from hepatitis C, a
viral infection of the liver, but it can be years after infection
before the patient develops symptoms.
We will see how  analysis can help us to evaluate this situation.
2
To learn more check this URL:
http://www.sciencedaily.com/releases/2001/04/010405081407.htm
The data is presented below. Patients were given a blood test
for hepatitis C and those with tattoos were asked whether they
got the tattoo in a tattoo parlor.
Hepatitis C
No
Hepatitis C
Total
Tattoo, from
parlor
17
35
52
Tattoo, not
from parlor
8
53
61
No tattoo
22
491
513
Total
47
579
626
Recall from our work much earlier in the year, that when data are
presented in tables like this, we can easily compare the
proportions of individuals in each category. Here, for instance,
we might think that if the chance of having hepatitis C is
independent of tattoo status, then a person’s risk (probability) of
having hepatitis C is the same regardless of whether they have a
tattoo. The same probability should apply to each category.
We will perform a Chi-square test for independence.
As with other statistical inference, we begin with a null
hypothesis. In the tests for independence, this hypothesis will
always be a statement that our two variables are independent.
Our alternate hypothesis will always be a statement that the two
variables are not independent. We must clearly state what the
variables are.
Step 1:
H0: The tattoo status and hepatitis C status are
independent.
Ha: The tattoo status and hepatitis C status are not independent.
Step 2:
Assumptions: Our data are counts.
With a test for independence we need a representative sample if
we are to apply our findings to a larger population. While these
patients were not an SRS they were selected to avoid bias and
should represent the general population.
We have the same criteria for the expected counts as we had
for the goodness-of-fit test:
1. All expected counts must be one or more.
2. No more than 20% of the counts may be less than 5.
The calculation of expected counts gets quickly complicated
when there are several categories. We will use the graphing
calculator to help us! We will use Matrix A to hold our data.
On the TI-83 graphing calculator press <MATRIX> and on the
TI-83 Plus press <2nd> <MATRIX>. Now the instructions are
the same: Select <EDIT> <1:[A]>.
Your display may look different
depending on whether you have old
matrices stored, as I do.
Change Matrix A dimensions to be 3 X 2.
Ignore the values in the matrix, they are old
data that will be replaced when new data is
entered..
Now enter the data.
Our easiest method of finding all of the
2
expected values is to run the  test on the
calculator and use the values it calculates
and stores in Matrix B.
With our data in [A] we now press <STAT> <TESTS>
<C:  2-Test> <Calculate> <ENTER>.
We’ll save this
information for later.
Now we view [B].
Press <MATRIX> <EDIT>
<2:[B]>, and view the
matrix of expected counts.
As we check the expected counts, we see that 2 out of 6 are
less than 5. This is an assumption violation, and a serious one,
as well. Don’t throw in the towel, though, at least not yet.
Notice that the expected counts
are not whole numbers. That is
typical, and don’t be tempted to
round them to whole numbers.
If we look at our original categories, we may find a way.
Hepatitis C
No Hepatitis C
Total
Tattoo, from
parlor
17
35
52
Tattoo, not
from parlor
8
53
61
No tattoo
22
491
513
Total
47
579
626
Our totals are not very large for either category of tattoos. If we
combine the two, we can increase our expected counts in the
combined category. In doing so, we lose some ability to identify
the source of the hepatitis C, should there be a connection between
tattoos and the hepatitis.
Hepatitis C
No Hepatitis C
Total
Tattoo
25
88
113
No tattoo
22
491
513
Total
47
579
626
Now we need to adjust our [A] and find a new [B] to check the
expected counts.
2

With new data in [A], run the test again,
and examine [B].
This time all expected
counts are greater than
5, so we meet the
assumption and can
continue.
Step 3:
2
(o

e)
2  
e
 25 88 
obs    22 491 


 8.484 104.515 
exp   38.515 474.484 
2
  42.418


Degrees of freedom are the number of rows
minus 1 times the number of columns minus 1.
df  (r 1)(c 1)  (2 1)(2 1)  1
Step 4:
Notice that the  distribution is shaped very differently with
only 1 degree of freedom. (It is similarly shaped with 2
degrees of freedom and then changes completely with 3 df.)
2
Step 5:
P(  2  42.4188)  7.37 1011
Step 6:
Reject H0, a test statistic this extreme will rarely
occur by chance alone.
Step 7:
We have strong evidence that tattoos and the
occurrence of hepatitis C are not independent.
2

Further if we examine the actual contributions from each
cell, we may be able to see the reason for our positive results.
There is no easy way with the graphing calculator to generate a
2

long list of contributions when we have large tables of data.
With a small table, such as ours, the task is not difficult.
Enter the observed and expected counts in L1 and L3, respectively.
Then in L3 calculate (L1-L3)2/L3, as we did in the goodness-of-fit
test.
We see that the largest contributor to
2
the  test statistic comes from those
with tattoos and hepatitis C. We
expected 8 and found 25. Our next
largest contributor is those with
hepatitis C and no tattoos. We
expected 38, but found only 22.
This study gives strong evidence that tattoos and hepatitis C are
in some way related.
Does this mean that getting tattoos causes hepatitis C?
Not necessarily.
What this study shows is the relationship between hepatitis C
and tattoos. The scientists conducting this study eliminated
many possible lurking variables and concluded that more than
IV drug use, getting tattoos exposed one to a great risk of
hepatitis C.
Some further information about the epidemic of hepatitis C that
has been tied to tattoo parlors is that the virus is spread by (1)
needles not sterilized between tattoo customers, (2) containers of
ink that became contaminated and then used on more customers,
and (3) the practice of some tattoo artists to use the needles to
prick the backs of their own hands to check for sharpness.
THE END