ASHG2010 - Statistical Genetics, Kyoto University

Download Report

Transcript ASHG2010 - Statistical Genetics, Kyoto University

Contingency table test for multiple-categories
with arbitrary weights on the cells
Ryo Yamada(1), Takahisa Kawaguchi(2)
Statistical Genetics(1) and Human Disease Genomics(2), Center for Genomic Medicine,
Department of Medicine, Kyoto University, Kyoto, Japan
Contact addres: [email protected]
Motivation
Abstract
How to test complex a categorical table
Example: Complex Categorical Phenotype
Test of independence between two factors is one of the most basic tests for
many fields. In genetics, it has been in regular use for genotype-phenotype
correlation. The simplest case of genotype-phenotype correlation handles two
categorical phenotypes (cases vs. controls) and three diplotypes of diallelic
genetic variants, i.e., 2x3 table tests. Even for this simple table, multiple tests,
Pearson's test with two degrees of freedom (df), genetic mode-specific tests
with 1 df (dominant test, recessive test, additive test). Sometimes MAX3 test or
MAX test, which is consisted of the three mode-specific tests, is also used. All
of these tests analyze the same table and they output different result because
they weigh cells of the table differently.
In the recent genetic epidemiology studies, the phenotypes can be multicategorical with disease stages or grades and combinations of multiple
conditions of disease criteria, and the number of categories of genotypes can
be more than three. Therefore it is beneficial to handle NxM tables in the same
way with the genetic mode-specific tests for 2x3 tables.
When we generalized the idea for 2x3 tables to NxM tables, we evaluated the
structure of tests geometrically. In the case of 2x3 tables, Pearson's test of 2 df
and other tests of 1 df were in the relation where Pearson's test's statistics
drew elliptic contour lines in 2-dimensional space and the others did straight
lines. We generalized two-dimension to df-dimension for NxM tables, where df
= (N-1)x(M-1) and gave the method to calculate statistics for NxM tables with
arbitrary weights on the celles.
C1
C2
C3
C4
C5
R1
+
+
+
+
Disease
R2
R3
+
+
+
+
+
+
+
+
R4
+
+
+
+
total
MM
200
180
90
90
270
830
Genotype
Mm
1260
840
420
420
1260
4200
mm
1470
980
490
490
1470
4900
total
2930
2000
1000
1000
3000
9930
Tests are described with ellipsoid and sphere
Tables with the same χ2 value for (N-1)(M-1) degrees of freedom (Pearson’s
test) draw ellipsoid and tables with the same χ2 value for 1 df test draw parallel
lines as a contour, which can be converted to spheres and tangent lines,
respectively. Their values are geometrically given.
Weight matrix of NxM table
(N-1)(M-1) component test matrices of MAX test for NxM tables
2x3 table: MAX3 test and MAX test
and their extension to higher
dimensions
Three tests (dominant ,additive and recessive modes) draw three sets of parallel
contour lines. MAX3 test takes the maximum stats among three mode tests.
MAX3 test’s contour lines consist of three sets of parallel lines.
MAX test make the contour lines of MAX3 test smooth arcs between dominant
and recessive mode.
Distribution of statistics compared
2
with χ of one degree and (N-1)(M-1)
degrees of freedom
Extension to 2x4 table
The Tips of green triangles are the surface normals for discrete model
Green triangles on the surface are the area of continuous model
Black dots : Observed tables
Red arcs the shortest path from observed table to the model
The arcs concentrate into the tips in “discrete MAX test”
The arcs reaches to the edges of the model area or the tips of the area
Arc
Reference
Yamada, R. & Okada, Y. An optimal dose-effect mode trend test for SNP
genotype tables. Genet Epidemiol 33, 114-127 (2009)
Hirosawa, K, Kawaguchi, T, Matsuda, F, Yamada R. Estimation of P-value of
MAX test with double triangle diagram for 2x3 SNP case-control tables Genet
Epidemiol (in press)