Transcript Document

Notes from the GAW14
“Genetic Analysis Workshop 14”
September 7-10, 2004
Noordwijkerhout, NL
Kelly Burkett
September 20th, 2004
Background (1)

The focus of statistical geneticists and genetic epidemiologists is
gene mapping (or finding genes/polymorphisms which
predispose to inherited diseases) and issues related to this aim




Ex: population substructure, data missing due to technology, effects of
genotyping error etc.
Some analyses use techniques applicable only to genetic data (Ex.
Linkage analyses); others don’t (Ex. Association analyses through
Case-control samples)
Biological processes and population genetics are often exploited
in genetic analyses. On the other hand, if traditional methods are
used, these mechanisms must be accounted for.
I primarily work on Association analyses on population-based
samples
Background (2)
Linkage analyses
 Rely on family data
 Study the transmission of blocks
of DNA within families (nuclear
families and extended pedigrees)
 Across multiple families, if
particular regions in the genome
are more likely to be present in
affecteds than in unaffecteds, these
regions are “linked” to disease
 Must account for the nonindependence of members in the
family
Association Analyses
 Rely on either family data (trios) or
population-based data
 Look for particular changes in the
DNA which are more likely in
those who are affected than in
unaffected. If found, these
polymorphisms are said to be
associated with the disease
 Although standard techniques can
be used, must account for
“population stratification”,
“genetic heterogeneity”, and nonindependence of genetic info on
the same chromosome
Background (3)

DNA markers

Changes distributed in the genome that are easy to genotype
and are not themselves necessarily disease-predisposing




Microsatellites- repeats of particular DNA sequences. Ex. (CA)15 .
The number of repeats is the change/allele that is
measured/genotyped
SNPs- “single nucleotide polymorphisms”. Changes in the DNA
sequence which involve the substitution of one DNA base for
another. Genotype which base/allele a person has.
DNA markers act essentially as genomic rulers for genome
scans. “Which marker is the disease polymorphism most
likely to be closest to…”
SNPs in genes or regions that regulate gene expression can
actually be related to disease.
Genetic Analysis Workshops
“The Genetic Analysis Workshops (GAWs) are a collaborative effort among genetic
epidemiologists to evaluate and compare statistical genetic methods. For each GAW,
topics are chosen that are relevant to current analytical problems in genetic
epidemiology, and sets of real or computer-simulated data are distributed to
investigators worldwide. Results of analyses are discussed and compared at meetings
held in even-numbered years.”

The Genetic Analysis Workshops were initially motivated by the development and
publication of several new algorithms for statistical genetic analysis, and reports that
using different methods of analysis often produced conflicting results.
 The Workshops provide an opportunity for participants
 to test novel methods on the same well-characterized data sets,
 to compare results and interpretations, and
 to discuss current problems in genetic analysis
http://www.gaworkshop.org/
GAW format






More than a year before GAW, suggestions for topic and data sets are
requested from those on the GAW mailing list
Data sets are assembled. Six or seven months before each GAW, a memo is
sent to individuals on the GAW mailing list announcing the availability of the
GAW data, a short description of the data sets and a form for requesting data
(March 29th)
Request data and analyse (we started end of April after receiving the data)
Submit written contributions approximately 6-8 weeks before the Workshop
(July 29th). Only those who contribute can attend the workshop. The GAW
Advisory Committee reviews contributions
Attend GAW. Contributions are divided into topic groups. Groups meet at
GAW to put together presentations summarizing all contributions on the
topic. The presentations for each group are made at the workshop.
The proceedings of each GAW are published. Proceedings from GAW14 will
be published in part by Genetic Epidemiology, and in part by Biomed
Central. GAW13 publications can be found at BMC Genetics (Analysis of
Longitudinal Family Data for Complex Diseases and Related Risk Factors)
GAW14 Data


Real Data:
 From the Collaborative Study on the Genetics of Alcoholism (COGA)
 143 pedigrees with 1614 members in total
 Provided with family relationships, discrete and quantitative phenotypes, some
covariates
 Genetic Data: a microsatellite genome screen (~400), two SNP genome screens
(sizes 11,555 and 4763)
 Also provided with a 10% replicate sample for those interested in QC
Simulated Data:
 Simulated a behavioural disorder with multiple phenotype definitions
 100 replicates of 100 families each are provided; 4 “populations” also simulated
 Control samples were generated for each of the replicates in each population
 Genetic Data: 416 marker microsatellite scan, 917 SNP marker scan
 Could “purchase” more SNPs in particular regions, to a maximum of 20
purchases
 Can request answers for power/type I error type studies
This years suggested topics..

SNP markers versus microsatellite markers




SNPs can take on only one of two forms. Microsatellite
markers can take on many. Therefore, it is thought that the
information content of SNPs might not be enough to
perform genome scans
A subset of this would be how many SNP markers = A
microsatellite markers
How does outcome/phenotype definition affect the
results of gene mapping studies
BUT.. Any analysis that involves the data that all
participants are given is acceptable!
Summary of topics:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Linkage mapping methods (real and simulated)
Quantitative Trait mapping
Heterogeneity
Parent-of-origin, “imprinting” etc
Multivariate analyses
Analyses of Alcoholism, Smoking and Related Traits
Data Mining
Genotyping Errors/Pedigree Errors and Missing Data
Haplotypes and TagSNPs
Detection and Implications of LD
Association Mapping
Case-Control Analyses
Integrating SNPs and microsatellites
SNPs vs microsatellites on linkage analyses (real and simulated)
Fine Mapping
“A comparison of three methods for selecting
tagging single nucleotide polymorphism”
Matt Pratola, Kelly Burkett, Mercedeh Ghadessi, Brad
McNeney, Jinko Graham and Denise Daley

Background:





Chromosomes are inherited as blocks from each parent
Variant at markers on a chromosome are not independent.
Due to recombination though, the farther two markers, the
more independent their values
Markers in a gene or region though will have redundant
information
A “tagSNP” is a marker which summarizes the information
from multiple markers
To save money, only want to genotype tagSNPs
“A comparison…” (2)








Association studies that I work with, each is studying ~40 genes
To genotype all variants in gene would be cost-prohibitive.
However not genotyping all will result in a loss of power
Interested in the performance of different algorithms for
choosing tagSNPs with respect to the power to detect a true
disease association.
Used one population of the simulated data.
Created a case/control study and used their definition of
“affected” for Kofendred syndrome
For each replicate, we used a sub-sample to choose tagSNPs.
Then only used information from the tagSNPs for the association
study. Measured the proportion of replicates having less than 0.05
p-values (Bonferroni corrected).
Didn’t have time to complete all 100 replicates. Will do for the
final publication
Conclusions: simulated data wasn’t realistic enough.. Quite
disappointing!
Notes on my experience



A lot of work !! Basically three months to come up with a topic,
complete an analysis (deal with data issues..), write a 5 page paper
and submit it
Apparently the data will be made available on-line at GAWweb.
Many Benefits


Topic we studied is directly related to what I have been working on, but
all the data simulation was done by someone else
Workshop itself was extremely useful.




To meet those in the field (put faces to names!)
Results from the workshop are highly referenced. The workshop summarised
the various contributions that I can reference at work.
To hear that some of the issues that I have started to struggle with are
struggled with by those who have way more experience than me.
I get a publication out of it!
Descriptive Stats

Number of attendees: ~230




From Canada? ~ 23
Number of papers submitted: ~184
Number of pages of notes that I took: 11
Time it will take me to follow up on these notes:
????