CHI-SQUARE(X2) DISTRIBUTION 11

Download Report

Transcript CHI-SQUARE(X2) DISTRIBUTION 11

2
CHI-SQUARE(X )
DISTRIBUTION
Chi-Square Test
CHI-SQUARE(X2) DISTRIBUTION
• PROPERTIES:
1.It is one of the most widely used
distribution in statistical applications
2.This distribution may be derived from
normal distribution
3.This distribution assumes values from
( zero to + infinity)
CHI-SQUARE(X2) DISTRIBUTION
4. X2 relates to frequencies of occurrence of
individuals (or events) in the categories of
one or more variables.
5. X2 test used to test the agreement
between the observed frequencies with
certain characteristics and the expected
frequencies under certain hypothesis.
CHI-SQUARE(X2) DISTRIBUTION
• CHI-SQUARE(X2) test of Goodness of fit
• CHI-SQUARE(X2) test of homogeneity
• CHI-SQUARE(X2) test of Independence
CHI-SQUARE(X2) test of
Independence
• It is used to test the null hypothesis that
two criteria of classification when applied
to the same set of entities are independent
(NO ASSOCIATION)
CHI-SQUARE(X2) test of
Independence
• Generally , a single sample of size (n) can
be drawn from a population, the frequency
of occurrence of the entities are crossclassified on the basis of the two variables
of interest( X &Y). The corresponding cells
are formed by the intersections of the rows
(r), and the columns (c).
The table is called the ‘contingency table’
CHI-SQUARE(X2) test of
Independence
• Calculation of expected frequency is
based on the Probability Theory
• The hypotheses and conclusions are
stated on in terms of the independence or
lack of independence of the two variables.
CHI-SQUARE(X2) test of
Independence
• X2=∑(O-E)2/E
• df=(r-1)(c-1)
For 2x2 table, another formula to calculate X2
n(ad-bc)2
X2 =-------------------------------(a+c)(b+d)(a+b)(c+d)
Steps in constructing X2 -test
1. Hypotheses
Ho: the 2 criteria are independent (no
association)
HA: The 2 criteria are not independent
(There is association)
2. Construct the contingency table
Steps in constructing X2 -test
3. Calculate the expected frequency for
each cell
By multiplying the corresponding marginal
totals of that cell, and divide it by the
sample size
∑E = ∑O for each row or column
Steps in constructing X2 -test
4. Calculated the X2 value (calculated X2 c)
X2=∑(O-E)2/E X2=∑(O-E)2/E
For each cell we will calculate X2 value
X2 value for all the cells of the contingency
table will be added together to find X2 c
Steps in constructing X2 -test
5. Define the critical value (tabulated X2)
This depends on alpha level of
significance and degree of freedom The
value will be determined from X2 table
df=(r-1)(c-1)
r: no. of row
c: no. of column
Steps in constructing X2 -test
6. Conclusion
If the X2 c is less than X2 tab we accept Ho.
If the X2 c is more than X2 tab we reject Ho.
Observed frequencies in a fourfold
table
Y1
Y2
Total
row total
X1
a
b
a+b
X2
c
d
c+d
Total
column
total
a+c
b+d
N=a+b+c+d
For r X c table X2 –test is not
applicable if:
1. The expected frequency of any cell is <1
2. The expected frequencies of 20% of the
cells is < 5
For 2 X 2 table X2 –test is not
applicable if:
The expected frequency of any cell is <5
EXERCISE
• A group of 350 adults who participated in a
health survey were asked whether or not
they were on a diet. The response by sex
are as follows
EXERCISE
male
female
Total
On diet
14
25
39
Not on diet
159
152
311
Total
173
177
350
EXERCISE
• At alpha =0.05 do these data suggest an
association between sex and being on
diet?
ANSWER
1. Ho: Being on diet and sex are
independent ( no association)
HA: Being on diet and sex are not
independent ( there is association)
2. Calculation of expected frequencies
173 x 39
Cell a =-------------=19.3
350
177 x 39
Cell b=--------------=19.7
350
2. Calculation of expected frequencies
173 x 311
Cell c =-------------=153.7
350
177 x 311
Cell d=--------------=157.3
350
Observed and (expected)
frequencies
male
female
Total
On diet
14
(19.3)
25
(19.7)
39
Not on diet
159
(153.7)
152
(157.3)
311
Total
173
177
350
ANSWER
3. Calculate X2 :
X2=∑(O-E)2/E
(14-19.3)2
(25-19.7)2
(159-153.7)2 (152-157.3)2
=-----------+-----------+--------------+------------19.3
19.7
153.7
157.3
=1.455+1.426+0.183+0.17
X2c =3.243
ANSWER
4. Find X2 tab
df= (r-1) (c-1)= (2-1)(2-1)=1
X20.95 df=1=3.841
ANSWER
5. Conclusion
Since X2 c < X2 tab we accept Ho ( No
association between sex and being on
diet)
Another solution
• Since this a 2x2 table we can use this formula:
n(ad-bc)2
X2 =-------------------------------(a+c)(b+d)(a+b)(c+d)
350{(14 x 152)-(25 x 159)}2
=------------------------------------- =3.22
39 x 311 x 173 x 177
(Example)
Five hundred elementary school
children were cross classified by
socioeconomic group and the
presence or absence of a certain
speech defect. The result were as
follows
Speech
defect
Socioeconomic Group
Upper
Present
8
(9.1)
Upper Lower Lower Total
middle Middle
24
32
27
91
(26.4) (30.9) (24.6)
Absent
42
(40.9)
121
138
108
(118.6) (139.1) (110.4)
Total
50
145
170
135
409
500
• Are these data compatible with the hypothesis
that the speech defect is unrelated to
socioeconomic status?
• 1) Ho :Speech defect and SE group are
independent ( no Association)
• HA: Speech defect and SE group are not
independent ( Association exist)
• 2)Calculate the expected frequencies
• 3)Calculate the X2 value ( calculated value)
•
X² = ∑ (0 –E)² / E
•
•
X² = ∑ (8 – 9.1)² /9.1 + (24 – 26.4)²/26.4
+ (32 – 30.9)² /30.9 + (27-24.6)² /34.6 +
(121 – 118.6)²/118.6 + (138 139.1)²/139.1 + (108 – 110.4)²/110.4
X²=0.5
•
•
Tab X²
DF = (2-1) (4-1) =3 → X²0.95 = 7.815
(Example 2)
• Five hundred employees of a factory that
manufacture a product suspected of
being associated with respiratory
disorders were cross classified by level of
exposure to the product and weather or
not they exhibited symptoms of
respiratory disorders. The results are
shown in following table:
Symptom
Present
Absent
Total
Level of exposure
High
Limited
185
(143.4)
120
(161.6)
305
33
(49.8)
73
(56.2)
106
No
Total
exposure
17
(41.8)
72
(47.2)
89
235
265
500
• Do these data provide sufficient evidence, at
the 0.01 level of significance to indicate a
relationship between level of exposure and
the presence of respiratory and the presence
of respiratory disorder ?
• 1) Ho : The presence of respiratory symptoms
and the level of exposure are independent.
• HA : The two criteria are not independent
• 2)Calculate the expected frequencies
• 3) Calculate the X2
• X² =∑ (185 – 143.4)²/143.4 + (33 –
49.8)²/49.8 + (17-41.8)²/41.8 + (120161.6)²/161.6 + (73 -56.2)² /56.2 + (7247.2)²/47.2 = 33.47
Tab X² 0.99 = 9.21 Reject Ho
• Df = (3-1) (2-1) = 2
(Example 3)
• In a clinical trial involving a potential
hypothesis drug, patients are assigned at
random either to receive the active drug or
placebo. The trial is double blind, that is
neither the patient nor the examining
physician knows with of the 2 treatment the
patient is receiving. Patients response to
treatment is categorized as favorable or
unfavorable on the basis of degree and
duration of response in BP. There are 50
patients assigned to each group.
Treatment
Out come
Drug
Placebo
Total
Favorable
34
9
43
Unfavorable
16
41
57
Total
50
50
100
• X² = n (ad – bc)²/(a+b)(c+d)(a+c)(b+d)
•
= 100[(34x41) – (9x16)]²/(50) (50)(43)(57)
•
=25.5
(Example 4)
• A study found that mongolism in
babies is associated with hepatitis A
injection of the mother during
pregnancy. Suppose a study of 2000
randomly selected mothers to be
yielded the following table after the
births of their babies.
Hepatitis A.
Baby
Mongoloid
NonMongoloid
Total
+
26
34
60
-
4
1936
1940
Total
30
1970
2000