Claims… - Western Washington University

Download Report

Transcript Claims… - Western Washington University

Personal Genomics
&
Watson’s Genome
Scott Bray
Jaimie Barkley
Rachel Blumhagen
Kristy Theodorson
• 1st human genome sequenced using NEXTGEN
technology
• Identified novel genes, SNPs, CNVs and indel
polymorphisms
• Results consistent with traditional methods used to
sequence Venter’s genome
• Pilot project for personalized genome sequencing
NEXTGEN Pros
• Less time
• Two months
• Less expensive
• Approximately 1/100 of the cost of traditional capillary
electrophoresis
• More Efficient
• Avoids loss of genomic sequence due to amplification of
DNA in a cell-free system
Quicker, smaller, cheaper
Genome
Sequenced
(publication year)
HGP (2003)
Venter (2007)
Watson (2008)
Time taken (start
to finish)
13 years
4 years
4.5 months
Number of
scientists listed as
authors
>2,800
31
27
Cost of sequencing $2.7 billion
$100 million
< $1.5 million
Coverage
8-10x
7.5x
7.4x
Number of
institutes involved
16
5
2
Number of
countries involved
6
3
1
M. Wadman Nature 452, 788 (2008).
How’d they do it?
•
•
•
•
•
Genomic Extraction of white blood cells
Nebulization
454 pyrosequencing
234 runs @ 105Mb per run
Assemble?
No Assembly required!! (ok, a little)
•
“Reference” sequence
(Build 36) to align
reads
–
–
–
official reference genome
assembly
includes both WGS and BAC
sequence data assemblies
additional genomic
sequences incorporated
• Reads were aligned to a
reference sequence
with 7.4X coverage
• Uniquely Mapped
Reads (1.5 million) were
WGS assembled
7.4X Coverage
X -Chromosome
Why would they have lower coverage on the X chromosome?
Single Nucleotide Polymorphisms
14 million initially found
Filter
3.3 million
Filter and Align
2.7 million
matched “known”
from dbSNP
0.61 million
deemed “novel”
10,425 did not match dbSNP
(unlikely to be third allele or error in
dbSNP  0.38% false discovery rate)
• For “known” SNPs: 50% homozygous, and 50% heterozygous, but
“novel” SNPs were mostly heterozygous - Why?
Does this result support the hypothesis that the SNPs are “novel”?
Traditional Sequencing
Venter’s Genome
• 7.5-fold coverage, using WGSA method with
Sanger sequencing
Similar “novel” SNP results
between NEXTGEN and
traditional sequencing
Verification of SNP Identification
• “known” SNPs identified were compared with the
experimental genotyping of the subjects DNA using
microarray
– microarray of reference sequence hybridized to Watson’s DNA
• 494,713 markers successfully genotyped
• Watson’s DNA sequence had high agreement with the
homozygous reference and homozygous variant, but relatively
low agreement to heterozygous – Why?
Accuracy of SNP Identification
13-fold coverage required
to detect 99% of all
heterozygous SNPs
Coverage is key
Insertions-Deletions (Indels)
• Identified 222,718
• Size range of 2-38,896 bp
Decrease in deletions
frequency with increase in size
of deletions
Why do they not have data on length of insertions?
Do the indels cause a frame shift?
• 345 indels found in coding regions
• Primers were designed for 111 of them,
followed by Sanger sequencing
– 78 indels validated  66 of them were in lengths
of multiples of 3 (no frame shift)
– 65 were found as heterozygotes
Interesting Find…
• They found a homozygous 4-base deletion in
exon 11 of Watson’s SGEF gene
• SGEF is highly conserved in vertebrates
– Guanine nucleotide exchange factor thought to
regulate membrane dynamics in promotion of
vesicle formation
What does this suggest???
CGH microarray
Copy Number Variations
• CNVs: local gains or losses of regions in the
genome because of duplication or deletion
– associated with genetic disease
– detectable by variation in the average DNA
sequence coverage of the region
• Comparative genomic hybridization (CGH)
used
– Examine relative fluorescence intensity in wells
– Microarray revealed 23 CNV regions
CNVs
• CNV’s are polymorphic:
- segregate as alleles with varying frequency,
- depends on the reference genome
• None of the CNV regions were identified to be involved with
any known phenotype.
– However 34 genes are predicted to be affected. These genes include:
two olfactory receptor groups, several with possible roles in prostate,
breast, and colon cancer, a gene from the HLA-D locus, and two
proteins involved in RNA editing.
Experimental Conclusions
• 3.3 million SNPs identified
– 8,996 were non-synonymous ‘known’ SNPs
– 1,573 were ‘novel’
• Of the non-synonymous known SNPs, 342
alleles matched mutations found in the
Human Gene Mutation Database (HGMD)
 32 disease causing
Experimental Conclusions
– 10 out of 12 alleles are highly penetrant, Mendialian
recessive disease-causing alleles
• 7 out of 10 were heterozygous, the other three only
exhibited one allele
• Subject does not have the diseases.
1.5 million unaligned reads
65% matched
known repeats
110K contigs
29Mb of sequence
33 cDNA w/ no map location
Protein prediction, 60 significant
Matches to 49 proteins
Criticism
• “It’s a new standard of sequencing
technology,” says Venter. “But I don’t think it’s
a new standard of genome coverage and
independent assembly.”
• Good if reference seq. is available, if not?
• Dealing with repeats with small reads (no
mate pairs, can coverage compensate?)
• Still haven't learned to read “the book of life”
Personal Genomics
“My Genome, My Self”- Steven Pinker
Jan. 11, 2009
• Personal Genome Project
(PGP-10)
• Publicly available for
association studies
• Personal genomics is
important to the
associations between
human genetic variation,
physiology and disease risk
Pros
– Personalized medicine, customized to patient’s biochemistry
– Better genetic testing for screening and prevention of at risk
patients
– Creation of dataset that can be referenced for association
studies
– Useful for evolution studies
Cons
– “Genes of Doom”
– Insurance and employment discrimination  GINA
– Direct-to-consumer testing ( bypass health professionals to test
for breast cancer alleles or even mutations linked to cystic
fibrosis)
– Genetic determinism
“Genetic Determinism”
Examples of Single Gene Disorders
• Autosomal recessive:
–
–
–
–
•
Cystic fibrosis (CF)
Phenylketonuria (PKU)
Sickle cell anemia
ADA deficiency, a rare
immunodeficiency disorder ("bubble
boy" disease)
Autosomal dominant:
– Familial hypercholesterolemia
– Huntington's disease
• X-linked recessive:
– Duchenne muscular dystrophy
– Hemophilia A
• X-linked dominant:
–
-
few, very rare, disorders are
classified as X-linked dominant
hypophosphatemic rickets (vitamin
D -resistant rickets)
All else is in the numbers…
or better yet the genes
• “Geno’s Paradox”: single genes are not very
informative
• Traits are typically a result of many genes, each
having little effects
 correlating genes with some traits is (currently) too
complex
 a test for a gene can identify ONE contributor to a
trait, but the observance of a trait
Pinker’s Results
• FALSE Results
•
http://fire.biol.wwu.edu/young/470/stuff/steven_pinker_2.html
• Contradictory and confusing
“If you want to know whether you’re at risk for
high cholesterol, have your cholesterol
measured; if you want to know whether you
are good at math, take a math test.”
–Steven Pinker
Common Types of Genetic Testing
• Newborn Screening: to identify disorders that can be
treated in early stages of development
– PKU treated by change in the mother’s diet
• Diagnostic: to confirm or rule out a specific genetic or
chromosomal condition typically after symptoms are present
• Carrier: to identify if individual carries a copy of a mutated
gene, typically done for prospective parents
• Predictive: presymptomatic, to assess probability of having
a genetic disorder that may appear later in life
Nature versus Nuture… versus Chance?
• Environment and life experience
• Stochastic events (chance)
– i.e. identical twins
• same genetic makeup, same environment
• Behavioral Genetics:
…WHO AM I?
 Personality traits
 Behavioral traits
 Decision-making traits
Genetic Information
Nondiscrimination Act (2008)
• Prohibits insurers from refusing coverage of a healthy
individual or charging that person higher premiums
based on their genetic predisposition to developing a
disease
• Prohibits employers from using genetic information
to discriminate against individuals in hiring, firing, job
placement, etc.
• “[GINA] is necessary to ensure that biomedical research
continues to advance… such legislation is necessary so that
patients are comfortable availing themselves to genetic
diagnostic tests.“- NHGRI
Ethics
• One copy of APOE E4 variant triples the risk of
developing Alzheimer’s
• Should your genome be public or private?
• Should genetic counseling be required?
• Third party complications
– Ex: Pinker found he has a gene for familial
dysautonomia, knew to get nieces and nephews
tested
Conclusions
– Sequencing and interpretation of personal genomes
will become more accurate with increase in
individuals sequenced
– Pro-active approach to ethical issues
– NEXGEN Sequencing: $ 100,000 genome
• http://www.knome.com/home/
– NEX-NEXGEN Sequencing: $ 1,000 genome
• 2004, NHGRI awarded $ 38 million dollars in grants
References
• Ellerbroek et al. SGEF, a RhoG guanine nucleotide exchange factor that
stimulates macropinocytosis. Mol Biol Cell. 2004 Jul;15(7):3309-19
• Pinker, S. My Genome, My Self. NYT, Jan 2009, pp 23-31.
• Levy, S. et al. The diploid genome sequence of a single individual. PLoS
Biol. 5, e254–e286 (2007).
• Wheeler et. al. 2008. The complete genome of an individual by massive
parallel sequencing. Nature 452: 872-877.
• Olson, M. 2008. Dr. Watson’s base pairs. Nature 452: 819-820.
• Wadman, M. 2008. James Watson’s genome sequenced at high speed.
Nature 452: 788.
Questions ???