blumberg-lab.bio.uci.edu

Download Report

Transcript blumberg-lab.bio.uci.edu

By Patrick Brennock and Kylee Katsumata
What are Copy Number Variations (CNVs)?
● DNA segments that are present in different numbers than
normal. Due to deletions, insertions, duplications,
● Arise from errors in HR, segmental duplication, DNA
damage and duplication of microsatellites.
● When in genes, can
result in change in
gene dosage.
● Not all genomes are
the same!
● Ex: AMY1, Salivary
Amylase
Why do we care about CNVs?
● Potentially disease-causing
o
o
Ex. Prader-willi/Angelman
Cancer
● Need to better understand the human
genome (How much variation exists? What
is the “true” genome?)
Where are CNVs?
● Previous studies found many CNVs, but an
exhaustive list did not exist.
● Objectives:
o
o
o
o
To create a map of CNVs and CNVRs,
to determine the general distribution of them
throughout the genome,
to determine which types are most prevalent,
and to determine the inheritance and population
differentiation patterns.
Defining CNVs and CNVRs
Figure 3 | Defining CNVRs, CNVs and CNV ends.
● One CNV = a region within an individual that has been
duplicated or deleted, giving a different copy number.
● One Copy Number Variable Region (CNVR) = all
overlapping CNVs at a particular location.
Microarray Hybridization
Reference Genome
n=2
Measure
Fluorescence
Fluorescence = 2x
Measure
Fluorescence
Fluorescence = 3x
Measure
Fluorescence
Fluorescence = 1x
Genome with duplication
n=3
Genome with deletion
n=1
Comparing Fluorescence of test
genome to reference genome
● log2 (Sample genome fluorescence/
________________ ____reference genome
fluorescence)
● Ex: log2 (4x/2x) = 1, twice as much fluorescence in test
genome
● Ex:log2 (1x/2x) = -1, half as much fluorescence in test
genome
● Put simply, a positive log2 value suggests a genomic
duplication, and a negative log2 value suggests a
genomic deletion
Figure 1 | Protocol outline for two CNV detection platforms.
● Measured HapMap cell line genomes (~150 individuals, some parent-offspring
trios, from different populations)
Figure 4 | Genomic distribution of CNVRs.
● 1447 discrete CNVRs were found, 66% were validated or found in previous
studies. ~ 50 CNVs detected per person, depending on platform used
● 12% of the genome are CNVRs, are susceptible to CNV
● Relatively evenly distributed throughout the genome
● Found that the 500K Affymetrix platform could detect smaller CNVs, whereas
the WGTP platform could detect CNVs better in duplicated genomic regions
Figure 2
Heritability of
five CNVs in
four
HapMap trios.
Figure 5 | Classes of CNVs.
● Assumed the rarer allele
was the mutant CNV one
● Left = example CNV.
Right = total # of each
CNV that was found by
each platform
● 500K EA platform had a higher resolution and thus was better able to map exact
breakpoints of CNVRs, better able to tell which sequences they overlapped
● On 500K EA platform, found that CNVs are statistically significantly absent from
protein coding genes and ultra-conserved elements.
● Also found that deletions are especially biased away from genes
● Still found thousands of genes that are flank or fall within CNVs
○ Plurality of these genes, based on gene ontology, were cell
adhesion , nuerophysiological, and sensory perception of
smell
Why Study CNVs?
Medical Relevance
● ~14.5% of genes in the OMIM morbid map had
CNVs; some landed in regions vital for Mendelian
and complex diseases.
● Examples:
o
o
Genetic: Angelman syndromes
Complex: Schizophrenia, psoriasis, cataracts
Why Study CNVs?
Medical Relevance
● Difficult to connect diseases to
their respective genotype in
complex CNVRs.
o
Example: One region about ~1Mb
in size in the 1q21.1 gene can lead
to:
 Congential heart defects
 Lens abnormalities
 Mental retardation
● The difference? Copy number
variation.
How They Studied CNVs
Single Nucleotide Polymorphisms
How about comparing them to SNPs?
● At the time, known to be important to human diversity.
● First genome-wide association study (GWAS) in 2005
o Deletions and duplications lead to disease.
How They Studied CNVs
SNPs
● Essentially extended what GWAS did.
o
o
Can CNVs be found with GWAS results?
Can we find CNVs with SNPs markers?
● Used SNPs from the HapMap Phase 1
database as markers to hopefully find CNVs.
o
o
Measured linkage disequilibrium with r2.
 High linkage disequilibrium = found together a lot.
If a CNV was close enough to the SNP marker, it was
“tagged” by the marker. (r2 > 0.8)
● After looking at three populations from the
database (European, African, or East
Asian)...
How They Studied CNVs
SNPs
Figure 6.a | Patterns of
linkage disequilibrium
between CNVs and
SNPs.
● (r2)
○ Pairwise Linkage
Disequilibrium
○ “Tagged” (r2) > 0.8
You can’t find CNVs with GWAS methods
meant for SNPs.
How They Studied CNVs
Lower LD for CNVs
● They considered transposons and frequent
mutations/reversions, but tests proved these
unlikely.
● Ultimately ended with CNVs prefer areas
with low SNP counts.
CNVs frequent areas of high dynamicism (tandem
repeats cross over unequally)
o The database only had SNPs in stable regions.
o Therefore, the low linkage disequilibrium is because
of a lack of SNP coverage in the genome.
o
Can SNPs predict the
number of copies in
CNVs?
● R2 values
○ 0 = No correlation
○ 1 = strong
correlation; very
predictive.
● According to WGTP,
the more accurate of
the two arrays, they
can’t.
Figure 6.b | Patterns of linkage disequilibrium between
CNVs and SNPs.
CNVs are useful!
Population genetics
● Population clusterings for
67 biallelic CNVs in 210
individuals (dots).
● If CNV genotypes could
not predict ancestry,
these dots would be all
over.
● Clusters are clear - CNVs
are similar at corners and
can predict ancestry.
Figure 7 | Population clustering from CNV genotypes.
CNVs are useful!
Population genetics
Lighter color =
WGTP results
Darker color =
500K EA results
Vst: 0-1; 1 being
population specific.
Figure 8 | Population differentiation for copy number variation.
CNVs can be compared in different populations and be
highly specific to one population or another.
In the end...
● Bottom line? This paper is essentially a
starting point.
o
o
o
CNVs are common
CNVs are worth studying
CNVs are in need of techniques to find their purpose
in disease.
● They believed there will be advances in…
o
o
o
… discovering CNV’s hand in diseases
… new techniques to identifying disease-causing
CNVs
… mapping CNVs, < 1kb or not, with fewer pieces of
technology.
Problems
● One Genome to Rule Them All?
o
Redon et al. heavily relied on a reference
genome. However, there is no one true Genome
that could fairly represent human genome CNV.
● Not all CNVs were found!
o
o
o
Limitations of the technology of the time.
 And the cost!
The techniques used were inaccurate.
 Algorithms were adjusted so 5% of possible
CNVs found were false positives.
 False negatives too.
CNVs < 1kb?
Updates?
A lot has changed since 2006.
● There have been many, many CNV studies.
● Diseases
o
Crohn’s disease, rheumatoid arthritis, and Type 1
and 2 diabetes.
● More techniques have been established to
find CNVs < 1kb.
Further Readings?
DISEASE, DISEASE EVERYWHERE?
● Craddock, N., et al. 2010. Genome-wide association study of CNVs in
16,000 cases of eight common diseases and 3,000 shared controls.
Nature. 464: 713-720.
CNVs and new techniques!
● Zhao, M., Wang, Q., Wang, Q., Jia, P., & Zhao, Z. 2013. Computational
tools for copy number variation (CNV) detection using next-generation
sequencing data: Features and perspectives. BMC Bioinformatics. 14: S1S1.
History and some more articles; by one of the original
authors of the paper.
● Scherer, Stephen W. "Proof of Extensive Copy Number Variation in The
Human Genome."