Transcript Lecture 4

Sampling from a MVN Distribution
BMTRY 726
1/17/2014
Sample Mean Vector
• We can estimate a sample mean for X1, X2, …, Xn
X
1
n
 X1  X 2  ...  X n  
  n X1 j 
 j 1

n


X

2
j

 1n  j 1




n
  X pj 
 j 1

 1  n X1 j 
 n j 1
  X1
1 n X   X

2
  n j 1 2 j   

 

 
 1n  n X pj   X p
j 1








1
n

n
j 1
Xj
Sample Mean Vector
• Now we can estimate the mean of our sample
• But what about the properties of X ?
– It is an unbiased estimate of the mean
– It is a sufficient statistic
– Also, the sampling distribution is:
X ~ N p   , 1n  
Sample Covariance
• And the sample covariance for X1, X2, …, Xn
 s11
s
'
j
21
S  n11  j 1  x j  x  x j  x   


 s p1
• Sample variance
sii  s 
2
i
1
n 1
 x
n
j 1
ij

 xi
2
• Sample Covariance
sik 
1
n 1
 x
n
j 1
ij
 xi
 x
kj
 xk

s12
s22
sp2
s1 p 
s2 p 


s pp 
Sample Mean Vector
• So we can also estimate the variance of our sample
• And like X, S also has some nice properties
– It is an unbiased estimate of the variance
– It is also a sufficient statistic
– It is also independent of X
• But what about the sampling distribution of S?
Wishart Distribution
• Given Z1 , Z 2 ,..., Z n ~ NID p  0, Σ  , the distribution of  j1 Z j Z'j
is called a Wishart distribution with n degrees of freedom.
n
•
A   n  1 S   j 1  x j  x  x j  x  has a Wishart distribution
'
n
with n -1 degrees of freedom
• The density function is
Wn 1  A Σ  
A
2
p n1
2

p p 1
4
n p 2
2
Σ
e
n1
2
 tr  AΣ1


2
n i


i1 2 
where A and  are positive definite
p
Wishart cont’d
• The Wishart distribution is the multivariate analog of the
central chi-squared distribution.
– If A1 ~ Wq  A1 Σ  and A 2 ~ Wq  A 2 Σ  are independent then
A1  A 2 ~ Wq  r  A1  A 2 Σ 


– If A ~ Wn A Σ then CAC’ is distributed Wn  CAC ' CΣC ' 
– The distribution of the (i, i) element of A is
aii   j 1  xij  xi
n

2
~  ii 2n1
Large Sample Behavior
• Let X1, X2, …, Xn be a random sample from a population with
mean and variance (not necessarily normally distributed)
 1 
 
μ 
 p 
 
and
 11

Σ
 p1

1 p 


 pp 
Then X and S are consistent estimators for  and . This
means
P
X μ
P
SΣ
as n  
as n  
Large Sample Behavior
• If we have a random sample X1, X2, …, Xn a
population with mean and variance, we can apply
the multivariate central limit theorem as well
• The multivariate CLT says
n  X  μ   N p  0, Σ 
and
X  μ   X  μ ~ 
'
1
n
1
2
p
Checking Normality Assumptions
• Check univariate normality for each component of X
– Normal probability plots (i.e. Q-Q plots)
– Tests:
• Shapiro-Wilk
• Correlation
• EDF
• Check bivariate (and higher)
– Bivariate scatter plots
– Chi-square probability plots
Univariate Methods
• If X1, X2,…, Xn are a random sample from a p-dimensional
normal population, then the data for the ith trait are a random
sample from a univariate normal distribution (from result 4.2)
• -Q-Q plot
(1) Order the data
xi1  xi2  ...  xin
(2) Compute the quantiles q1  q2  ...  qn according to
qj
j  12
1  12 z 2

e dz

n
2
leads to
(3) Plot the pairs of observations
 j  12 
qj   
j  1, 2,..., n

 n 
1
 x   q  ,  x   q  ,...,  x   q 
i1, 1
i 2, 2
i n, n
Correlation Tests
• Shapiro-Wilk test
• Alternative is a modified version of Shapiro-Wilk test
• Uses correlation coefficient from the Q-Q plot
rQ 
 x    x q  q 
  x    x   q  q 

n
j 1
i
i j
2
n
j 1
i j
i
j
n
j 1
2
j
• Reject normality if rQ is too small (values in Table 4.2)
Empirical Distribution Tests
• Anderson-Darling and Kolmogrov-Smirnov statistics measure
how much the empirical distribution function (EDF)
Fn  xi  
 number observations less than or equal to xi 
n
differs from the hypothesized distribution
F  x, θ  using θˆ to estimate θ
• For a univariate normal distribution
  ˆ  x 
 xx 
θ   2  , θ   2  , and F x, θˆ = 

s

s


 
 
 
• Large values for either statistic indicate observed data were
not sampled from the hypothesized distribution
Multivariate Methods
• You can generate bivariate plots of all pairs of traits and look
for unusual observations
• A chi-square plot checks for normality in p > 2 dimensions
(1) For each observation compute
d 2j   x j  x  S 1  x j  x  , j  1, 2,..., n
'
(2) Order these values from smallest to largest
d21  d22  ...  d2n
(3) Calculate quantiles for the chi-squared distribution with p d.f.
q1  q2  ...  qn
j  12
 P   p2  q j 
n
Multivariate Methods
(1) Plot the pairs
 d  , q  ,  d  , q  ,...,  d  , q 
2
1
1
2
2
2
n
2
n
d 2j 
qj
Do the points deviate too much from a straight line?
Things to Do with non-MVN Data
• Apply normal based procedures anyway
– Hope for the best….
– Resampling procedures
• Try to identify an more appropriate multivariate
distribution
• Nonparametric methods
• Transformations
• Check for outliers
Transformations
• The idea of transformations is to re-express the data
to make it more normal looking
• Choosing a suitable transformation can be guided by
– Theoretical considerations
• Count data can often be made to look more normal by
using a square root transformation
– The data themselves
• If the choice is not particularly clear consider power
transformations
Power Transformations
• Commonly use but note, defined only for positive
variables
• Defined by a parameter l as follows:
y j  x lj
if
y j  ln  x j 
l0
if
l 0
• So what do we use?
– Right skewed data consider l < 1 (fractions, 0, negative
numbers…)
– Left skewed data consider l > 1
Power Transformations
• Box-Cox are a popular modification of power
transformations where
yj 
x lj  1
l
y j  ln  x j 
if
if
l0
l 0
• Box-Cox transformations determine the best l by
maximizing:
2
n
n


1
l  l    ln n  j 1  y j  y j    l  1  j 1 ln x j


l
x
1
n
y j  1n  j 1 j
n
2
l
Transformations
• Note, in the multivariate setting, this would be
considered for every trait
• However… normality of each individual trait does not
guarantee joint normality
• We could iteratively try to search for the best
transformations for joint and marginal normality
– May not really improve our results substantially
– And often univariate transformations are good enough in
practice
• Be very cautious about rejecting normality
Next Time
• Examples of normality checks in SAS and R
• Begin our discussion of statistical inference for
MV vectors