STT 430/530, Nonparametric Statistics

Download Report

Transcript STT 430/530, Nonparametric Statistics

• Spearman’s correlation coefficient , rs, can be
computed as Pearson’s r on the ranks; i.e., rank
the X’s (among the X’s) and the Y’s (among the
Y’s) and then compute the correlation of the
ranks…
• See Table 5.2.1 and let’s do it in R (use cor with
method=“s” or “p” on the ranks...)
• We may test the null hypothesis of no
association between X and Y by doing a
permutation test on the ranks – all possible
assignments of the ranks of the Y’s to the ranks
of the X’s – if our correspondence yields an
unusually high (or low) value of rs, then we
should reject the hypothesis of no association
between X and Y.
• We may also test the above hypothesis with the
same normal approximation used for Pearson’s
r: Z= rs(sqrt(n-1)); i.e. rs is approx.
N(0,1/(sqrt(n-1))
• What about ties?? There are two methods
mentioned on p.155ff:
– compute adjusted ranks (midranks) and apply the
same formulas we’ve just mentioned
– use the tie-adjusted formulae given on page 156
(see the next slide...)
– the author (and I too!) recommend the former.
• The following formula for Spearman’s rank
correlation (without ties) appears in the
literature and we’ll mention it here. It is the
one that can be modified for ties – see page
156 where it is defined...
n
6D
2
rs  1 
where
D

(
R
(
X
)

R
(
Y
))

i
i
n(n2 1)
i 1
Verify that it gives the same results – see problem
#13 on page 192-193 for an outline of the
theoretical proof of the equivalence of this
formula to the definition of rs .
• Another measure of association is Kendall’s Tau,
t, which looks at the distribution of concordant
and discordant pairs of the (X,Y)s:
• (Xi,Yi) and (Xj,Yj) are concordant if Xi < Xj implies
Yi < Yj and discordant if Xi < Xj implies Yi > Yj (or
equivalently, concordant if (Xi – Xj)( Yi - Yj ) > 0;
discordant if (Xi – Xj)( Yi - Yj ) < 0). X and Y are
positively associated if pairs are more likely to be
concordant than discordant and negatively
associated if pairs are more likely to be
discordant than concordant.
t  2 P[( X i  X j )(Yi  Y j )  0] 1
• Note that tau is just rescaled to be between -1
and +1; if there is no association, then the
probability of a concordant pair is the same as
the probability of a discordant pair, .5, so t = 0.
• We estimate tau by counting the fraction of
concordant pairs in the data, doubling it and
subtracting 1
j j
rt  2
V
i 1
i
n
 
 2
1
• Here,
1, if ( X i  X j )(Yi  Y j )  0 
Vi   U ij , where U ij  

0,
if
(
X

X
)(
Y

Y
)

0
i
j
i
j
j i 1


n
• Ranks may also be used to compute tau, since
pairs of ranks are concordant or discordant
according to whether the original pairs are
concordant or discordant.
• R computes Kendall’s tau in cor.test and SAS
computes it in PROC CORR;
• Exact p-values for testing the hypothesis of no
association between X and Y may be obtained by a
permutation test; approximate p-values may be
obtained from the large sample properties of
Kendall’s tau statistic:
rt is apprximately N (0, SD (rt )),
4n  10
where VAR(rt ) 
9(n 2  n)
• HW: Read Chapter 5 through page 163 – we will
complete this topic (association between two
continuous variables) on Thursday – have your
questions ready by then. Do problems #3-5 on
page 189-190 … we’ll discuss them next class...