STT 430/530, Nonparametric Statistics

Download Report

Transcript STT 430/530, Nonparametric Statistics

• Assume as previously that we have k samples
on as many treatments. We’ll let Rij denote the
rank of the sample value Xij and as before N =
the total number of sample values (see the
Table 3.2.1 on page 86 for the complete
notations…). Then the Kruskal – Wallis statistic
is the rank-based equivalent to the F-statistic
given by the formula
12
N 1

KW 
ni  Ri 


N ( N  1) i 1 
2 
k
• Note that the sum in the KW formula is actually
the treatment sum of squares (remember,
(N+1)/2 is the mean of all the N ranks). The
constant coefficient of the sum is a “scaling
factor” which makes the KW statistic have
approximately a chi-square distribution with k-1
degrees of freedom. Thus p-values may be
obtained from the chi-square tables, for large
enough sample size; or we may use Table A6 in
the Appendix (for small sample sizes and for
k=3 or 4; or we may use a permutation test
based on this KW statistic…
2
• The permutation test based on the KW
statistic is done in a similar manner to the
others we’ve done except we’ll have to
compute KW after each “shuffle” (sample)…
try it now on Example 3.2.1 on page 87…
see R#7 handout.
• For SAS, use PROC NPAR1WAY
WILCOXON;
but be careful about using the EXACT
WILCOXON; statement in the k-sample case
– it can take several minutes to actually
compute the exact probabilities… Try this on
the data from Table 3.2.2 on page 87
• In the case of ties in the data, use mid-ranks
to compute the ranks and make one of two
adjustments (see p. 88 and 89):
1
KWties  2
SR
N 1 

ni  Ri 


2


i 1
KW
KWties 
g
1
3
(
t
 i  ti )
i 1
N3  N
k
2
, where KW  the " no ties " KW
• It is also possible to create a “KW-like” statistic
for general scores (not just ranks or mid-ranks)
such as van der Waerden scores. See the
statistic GS on page 91 and go over it
carefully…
• HW: Finish reading this section 3.2; make
sure you can calculate the KW statistic in R
and SAS and understand the output.
• Midterm HW: Apply the Kruskal-Wallis
permutation test to the data in problem #2 on
page 105.
/*use the following to calculate the tied KW statistics*/
/*note that proc npar1way does the tied KW on p.88*/
dm log 'clear'; dm output 'clear';
options ls=80;
data table3_2_3; input food_group $ salt_score @@;
datalines;
pr1 4 pr1 5 pr1 3 pr1 4 pr1 5 pr1 5 pr1 2
pr2 3 pr2 4 pr2 5 pr2 2 pr2 3 pr2 1 pr2 1 pr2 2
pr3 2 pr3 1 pr3 1 pr3 2 pr3 1 pr3 3
;
proc print; run;
proc sort; by food_group; run;
proc rank; run; *to get the mid-ranks;
proc means; by food_group; run;
*to get the means of the ranks for each group;
proc means; run;
*to get the mean and s.d. for all the ranks combined;
proc npar1way wilcoxon data=table3_2_3;
class food_group; var salt_score; run;
quit;