Glycemia and Wt Mngt. Olz

Download Report

Transcript Glycemia and Wt Mngt. Olz

Suppose we conduct a t-test of the difference between
two means and obtain a p-value < .05. Does this
mean:
a) There is less than a 5% chance that the results are due to
chance.
b) If there really is no difference between the population means,
there is less than a 5% chance of obtaining a difference this
large or larger.
c) There is a 95% chance that if the study is repeated, the result
will be replicated.
d) There is a 95% chance that there is a real difference between
the two population means.
Adapted from: Wulff HR, Andersen B, Brandenhoff P, Guttler F (1987):
What do doctors know about statistics? Statistics in Medicine 6:3-10
What is a p-value?
The probability of obtaining a test
statistic (data) that departs as much
as or more than the observed test
statistic (data) if the null hypothesis
were true.
Which Null Hypotheses are
Meaningful and Testable?
Those that precisely
specify a probability
model for the data.
A Perspective
 We
study:
Samples
Data
Populations
Nature
 We
wish to
obtain
knowledge
about:
Gene Family-Based Hypothesis
Testing
Sketch of Typical (outmoded and inappropriate) Approach:
1. For Genes 1 to K, define a vector, R, of length K that
contains the values of a categorical variable denoting
group membership.
2. For Genes 1 to K, define a vector, C, of length K that
contains the values of a binary variable denoting
whether or not the gene was ‘significant’ or ‘interesting’
by some standard.
3. Conduct some frequentist significance test for an
association between R and C.
Assume Independence







“Fortune cookie bet made
Powerball lottery players rich”
(from N. Y. Times, 2005)
110 players in March 30th drawing
get 5/6 numbers right.
Odds of getting 5/6 numbers is ~
1 in 3,000,000.
Expected only 4 or 5 second
place winners.
Players used fortune cookies to
obtain numbers. All cookies came
from same factory.
Numbers selected by workers
writing numbers on paper and
putting in bowl for selection.
Same number combinations went
out in thousands of cookies a day.





Story raises important point of
independence assumption in
microarray analyses.
Majority of microarray statistical
tests assume independence
among genes.
However, we know that genes do
not function independently of each
other. Work in networks.
What are the implications of the
assumption in our final results.
Important impact on final results
when investigating the role of
thousands of genes within a
biological system.
The Independence Issue: A Real
Example
Simulated P-value for 42 out of 42
0
-2
-4
-6
-8
-10
-12
-14
0
0.2
0.4
0.6
0.7
0.8
Gene Family-Based Hypothesis
Testing
Which Null Hypothesis is Being Tested?
1.
None of the genes in family c are differentially expressed (associated,
methylated, etc.).
2.
The proportion of genes in family c that are differentially expressed is
equal to the proportion of genes in the remainder of the genome that are
differentially expressed (beware of ‘anti-Bayesian’ element).
3.
The proportion of genes in family c that are differentially expressed to an
extent greater than  is equal to the proportion of genes in the remainder
of the genome that are differentially expressed.
Note: These can all be subsumed under the general:
H0:
 C ,   C ,
Union-Intersection vs Intersection-Union Tests

Union-Intersection
• The compound hypothesis
is rejected if any one of the
individual hypotheses are
rejected
• Multiplicity adjustment
procedure is required to
control type I error rate
• The rejection region for
this test is the union of
rejection regions
corresponding to the
individual tests
When P << N, methods are well
established (e.g., multiple regression.
When P >> N optimal methods are not yet
clear.

Intersection-Union
• The compound hypothesis
is rejected only if all of the
individual hypotheses are
rejected
• Overall type I error rate of
α is maintained without
multiplicity adjustment
• The rejection region for
this test is the intersection
of the rejection regions
corresponding to the
individual tests
Methods not yet well established.
Bayesian methods involving posterior
probabilities in place of p-values may be
especially useful.
What assumptions are being
made?

Normality?
 Exchangeability?
 Independence?
 Other?
•Non-Parametric: Non-Panacea (Cohen, J.)
•Asymptotic  Exact
Major Issues to Ask About in Selecting a
Method for Gene Family or Pathway Testing
►
►
►
►
What is the null?
Does the method assume that all
components (e.g., SNPs or gene
expression levels) are independent?
Is the method ‘anti-Bayesian’?
Does the method use the continuity of
information (not simply significant or not)?