Prioritizing Regions of Candidate genes for efficient

Transcript Prioritizing Regions of Candidate genes for efficient

PRIORITIZING REGIONS OF
CANDIDATE GENES FOR
EFFICIENT MUTATION
SCREENING
Outline






Abstract
Background
Materials and Methods
Results
Discussion
Conclusion
Abstract

Complete sequence of human genome has altered
search process for disease-causing mutations
Previously, mostly rare diseases studied. Took years to
analyze data
 Now, rate-limiting step is screening patients and interpreting
results



Tests hypothesis that disease-causing mutations are not
uniformly distributed and can be predicted
bioinformatically
Developed prioritization of annotated regions (PAR)
technique
Abstract



Tested by analyzing 710 genes with 4,498
previously identified mutations
Nearly 50% of disease-associated genes found
after analyzing only 9% of complete coding
sequence
PAR found 90% of genes as containing at least one
mutation using less than 40% of screening resources
Background

When screening for mutations, researchers usually
focus on coding sequence
 Not
enough to show relationship between mutation and
disease
 Ex.

Age-related macular degeneration
Today’s techniques:
 Single
strand conformational polymorphism analysis
(SSCP)
 Denaturing high-performance liquid chromatography
 Automated DNA sequencing
Background

SSCP
 Compares
conformational differences in strands of
DNA of the same length (1)

Denaturing high-performance liquid
chromatography
 Compares
two or more chromosomes as a mixture of
denatured and reannealed PCR amplicons, revealing
the presence of a mutation by the differential retention
of homo- and heteroduplex DNA on reversed-phase
chromatography supports under partial denaturation
(2)
Background
 Through
own work, found disease-causing variations are
not uniformly distributed throughout sequence
 Ex.
Bardet-Biedl: Restrict to patients with retinitis pigmentosa
with ulnar polydactyl
 Disease-causing mutations more likely lie in structural and
functional regions
Materials and Methods

List of 710 genes obtained via OMIM
 Cross-referenced
with transcripts in Ensembl Release
NCBI31



Gene structure and annotated protein domains
obtained from Ensembl
Information on mutation locations obtained from
OMIM
Secondary structure prediction performed by
nnPredict
Materials and Methods

x = nucleotide position
Ws = PAR window size
Nx = No. distinct annotation
elements
W(i) = PAR window function

Af(x,j) = annotation function for



jth annotation at xth position



As(x,j) = annotation score for jth
annotation at xth position
Ao(x,j) = annotation scalar
offset
Am(j) = annotation multiplier for
jth annotation feature
Materials and Methods
Materials and Methods


Impractical to perform
manually for every
gene in candidate set
Graphic
representation of
gene structure of
EFEMP1 gene and
corresponding PAR
values
Materials and Methods


Regions in each gene were identified that
maximized PAR function
Primer pair positions selected consistent with default
parameters of Primer3 until at least one mutation
flanked
Materials and Methods

Other methods used for comparison
 Serial
 Generates
minimally overlapping primer pair positions for
each exon with same PCR product size requirements
 Models traditional screening approach
 Examines complete coding sequence
 Random
 Selects
region from any transcript without replacement
 Continues to select with minimal overlap

Complete screening with laboratory information
management system (LIMS)
Results - Efficiency

PAR
 Found

90% of mutations with 60% coverage
Serial
 Linear:

90% at 90%, 100% at 100%
Random:
 Fell
short of identifying 100% of mutations
Results
Results – Figure 2

PAR
 819
mutations identified in 350 distinct genes using a
single best PAR-selected region per gene
 Corresponds to 18% of mutations in approximately
half the transcripts
 Of 1,908,911 nucleotides, PAR selected only 168,980
 One mutation was identified in 50% of genes with only
9% of total transcript screened
Results
Results – Figure 3

Serial
 Linear
relationship between screening resource
utilization and number of genes

PAR
 Identified
90% of genes with 60% reduction in
screening resources
 Only one primer pair in each transcript was evaluated
and nearly 40% of transcripts found to contain at least
one mutation
Discussion

History of genetic screening
 PCR
 Lengthy
clinical work
 Therefore, always evaluated entire coding sequence in
all patients

Explains current use of serial screening
Discussion

Changes
 More
common diseases being analyzed
 More
available patients
 Availability
 Develop
of genomic sequence
PCR-based assay in less than a day with algorithms
 More
involvement from other professions (engineers,
statisticians)
 Supply
tools to keep track of experiments
 Realization
that many disease-causing mutations do not
affect coding sequences
Discussion

Advantages of PAR
 Effective
use of gene annotation
 Prioritizes
gene segments for screening
 Conservation of protein structure
 Focus
on gene segments vs. entire gene
 Evident
that likelihood of finding disease-causing variation
in a gene falls with each exon screened with no positive
result
 Serial approach screens all no matter what
 PAR screens a section with an average chance of finding
mutation
Conclusion



Consideration of parameters resulted in significantly
higher discoveries per unit of effort
Algorithm can be easily modified and expanded
Most useful for large number of candidate genes in
large number of patients
Select best two or four regions in each candidate gene
 Screen all as initial screening strategy
 Additional screening based on findings from first round and
PAR algorithm


Clear PAR approach is preferable to serial screening
References


(1) "Single Strand Conformation Polymorphism."
Wikipedia. 28 May 2008. 21 Sept. 2008
<http://en.wikipedia.org/wiki/single_strand_confo
rmation_polymorphism>.
(2) "Single Strand Conformation Polymorphism."
Wikipedia. 28 May 2008. 21 Sept. 2008
<http://en.wikipedia.org/wiki/single_strand_confo
rmation_polymorphism>.

Prioritizing Regions of Candidate genes for efficient

Transcript Prioritizing Regions of Candidate genes for efficient

Directory