Poster - UBC Department of Computer Science
Download
Report
Transcript Poster - UBC Department of Computer Science
Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng
Department of Computer Science, University of British Columbia
The British Columbia Cancer Research Centre
Permutation Test
Previously analyzed publicly available Breast and Brain SAGE libraries using the
permutation test (Ng. et al, Frontiers of Cardiovascular Science 2003) and had some success
(60% of top ranked genes for breast SAGE data were verified to be related to the neoplastic
process).
BC Cancer Research Centre has produced various Lung Cancer SAGE libraries including 5
CIS (carcinoma in situ), 6 Invasive and 17 Normal libraries.
It would be interesting to use the permutation test to contrast and compare the various stages
of lung cancer and search for small transcriptional changes (pathway regulators, check points,
switches).
1981 out 32,871 TAGS considered at 99% confidence failed the permutation test
for Normal vs Invasive Lung Cancer.
1887 TAGS out of 40,476 TAGS considered at 99% confidence failed the
permutation test for Normal vs CIS Lung Cancer
Pool together
cancer and
normal
libraries
Null Hypothesis:
H : c n 0
Alternative Hypothesis
H a : c n ! 0
To use the permutation test on normal and different stages of lung cancer (CIS and Invasive)
SAGE libraries to discover candidate cancer-related genes.
To contrast and compare these two stages of lung cancer.
To demonstrate the advantages and power the permutation test holds over the T-test.
Mean
Simulated Normal Pool
(same size as normal
samples)
Sum of Squares
freq(i )
i 1
M
M
SS ( X i ) 2
N
i 1
Permutation (Z) Score
O I
PS
StdvI
Simulated
I | sc sn |
SS
Stdv
M
Some tags map to more than one gene. To deal with this, the expression level of the tag is
assigned to each gene the tag maps to. For instance, if tag A maps to genes 1, 2, and 3, all the
genes will be assigned the tag count of tag A.
Simulated Cancer Pool
(same size as cancer samples)
Observed
O | rc rn |
PLOT
Score those >=99%
confidence
99% confidence - Output
The null hypothesis states that there is no difference between the mean of the normal and the cancer sample. If this
were the case, it would make no difference if we “mix up the labels” of the libraries.
The alternative hypothesis states that it does make a difference and the mean of the normal and cancer sample are
different.
To reduce comparison errors, the tag frequencies are normalized by scaling each
library up to 300,000.
Power of The Permutation Test
With the permutation test, the number of samples required for the test to be
acceptable is relatively low compared to other statistical tests (ie. T-test, chi-square).
Scoring and Ranking Genes
An investigation is conducted on the top ranked genes for cancer-relation using the currently
available literatures on PubMed.
Number of Combinations
Criteria
INV vs
Normal
CIS vs
Normal
A
0
B
Top 20 TAG That Map to Genes - Permutation Test Results
Criteria
INV vs
Normal
CIS vs
Normal
0
A
1
3*
0
1
4
5*
C
D
E
Total Unique Significant Genes
0
1
0
1
0
0
1
2
5
6
8
3
5
15
12
Total Hypotheticals
11
8
8
B
C
D
E
Total Unique Significant Genes
Total Hypotheticals
5
1
Quality of these genes is mostly dependent on criteria A and B. Following closely are criteria C and D as
they are important genes in the neoplastic process
Hypotheticals or genes who have no known function did not meet any of the criteria.
* Indicates that there exists a duplicate (more than one TAG match to the same gene).
Top N Ranked TAGS Intersections
50
2
100
16
200
51
300
88
400
136
500
184
1000
450
The low intersections suggest that CIS and Invasive stages
of cancer are different.
Higher permutation scores correspond to either greater differences between the two samples or greater differential
consistencies between the two samples.
For each tissue and significant genes, rank the genes by sorting the permutation scores in descending order.
Literature Verification
# of Samples vs # of Combinations (Log Scale)
Top 20 TAG That Map to Genes - T-test Results
M
Standard Deviation
Data Pre-Processing
119 TAGS out of 20,077 TAGS considered failed the permutation test for CIS vs
Invasive Lung Cancer
10000000000
The permutation test is great at picking out genes that are related to the neoplastic process.
It is also much better at picking out these genes than the T-test.
The permutation test between Invasive and CIS show that there are 119 Tags that are differentially expressed
which suggests that the two stages of cancer have different genes turned on or off. In addition, the intersections
between the top ranked genes between Normal vs Invasive And Normal vs CIS are quite low (top 200 only 25%
of the Tags intersect) which also suggest differences between the 2 stages.
Verification Criteria:
100000000
1000000
Criteria # Related to:
10000
100
1
1
10
Number of Samples
A
Up/Down regulated in Lung Cancer
B
Up/Down regulated in different type of cancer
C
D
Oncogene/Tumor suppresor/Mutator
E
Not previously associated with cancer
Major component of the cell cycle (neoplastic
process), or Angiogenesis
100
Continue to use the permutation test to analyze other SAGE libraries.
The permutation also has the power to detect small transcriptional changes as long as the gene
across all the libraries have a consistent Tag count. Further analysis of these low TAG count
significant genes (with high permutation scores) is required as they could be vital pathway
regulators, checkpoints or switches that may have led to the onset of lung cancer.
Validate genes further by experimentation.
Use validated genes for early cancer detection or derive new treatments from data.