Transcript Slide 1

Comparative network analysis
of neurological disorders
focuses the genome-wide
search for autism genes.
Dennis P. Wall, PhD
Center for Biomedical Informatics
[email protected]
http://wall.hms.harvard.edu
Outline
• Rationale & Biological Significance (30
mins)
• Present status (5 mins)
• Project Plan (25 mins)
Introduction
• Polygenic & Multigenic
• Many genes have been linked to autism
• Few genes have been replicated in across
studies
• Difficult for a single researcher to grasp
the complexity of the autism gene
landscape
Statistics
U.S. number of cases 1992-2006
http://www.fightingautism.org
Behavioral overlap with other
disorders
Angelman Schizophrenia
Epilepsy
Fragile X
Rett
Syndrome
Autism
Seizure
Disorder
Mental
Retardation
Tuberous
Sclerosis
Others??
Approach
• Build the network of all genes implicated in
Autism to date
• Conduct large comparative analysis of
Autism and other neurological disorders at
the level of genes, biological processes,
and networks
• Leverage existing research on Autismrelated disorders to find new genetic
leads.
Building Gene Lists for All
Neurological Disorders (433)
Ataxia
NINDS
Asperger
Fragile X
Tourette’s
OCD…
OMIM
Gene Lists
Epilepsy
OCD
GeneCards
Autism
Disease source
Gene-Disease sources
Disease gene database
ADHD
Autism Cluster
Tourette Syndrome
Attention Deficit Hyperactivity Disorder
Primary Lateral Sclerosis
Neurotoxicity
Disorders
Down Syndrome
Genes
1100100101…
1110101011…
1001010100…
1001011101…
//
1101011101…
Dementia
Alzheimers Disease
Alzheimer Disease
Brain Injury
Stroke
Multiple Sclerosis
Systemic Lupus Erythematosus
Cerebral Palsy
Erbs Palsy
Neuronal Migration Disorders
West Syndrome
De Morsiers Syndrome
Williams Syndrome
Hydrocephalus
Encephalopathy
Huntington Disease
Epilepsy
Schizophrenia
Asperger Syndrome
Angelman Syndrome
Autism
Rett Syndrome
Hypotonia
Infantile Hypotonia
Autism Cluster
Spasticity
Microcephaly
mental retardation
Fragile X
Ataxia
Hypoxia
Seizure Disorder
Tuberous Sclerosis
obsessive compulsive disorder
Major Depression
Migraine
Network Construction
• Data derived from STRING
(http://string.embl.de/)
• Integration of p-p interaction (interactome),
co-expression (transcriptome), orthology
(orthologome),text (bibliome), and other
lines of evidence.
• Focus on creating a networks of possible
interactions within a normal cell using
classification methods (random forests)
Sequence coEvolution
A
Correlated Expression
B
P-P Interaction
B
Random Forest Decision
D1 D2 D3 D4 D5
A
Text (aka Bibliome)
FXYD1 is identified as a MeCP2
target gene whose de-repression
may directly contribute to Rett
syndrome neuronal pathogenesis
D1 D3
D4 D3
= {1,0,2,1,0}
D2
D3
D4
D5
D1
D2
No
Yes
D3
D4
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030043
Networks for all AC disorders
FragileX
Microcephaly
Rett
(97N/100E)
(135N/166E)
Hypoxia
(586 N/4359E)
(48N/74E)
Angelman
(51N/57E)
Tuberous
Sclerosis
Inf. Hypotonia
(29N/16E)
(110N/204E)
Asperger
(15N/9E)
Autism
Hypotonia
(145N/164E)
(154N/208E)
Mental
Retardation Ataxia
Spasticity
(62N/40E)
(573N/1035E)
(428N/1489E)
Seizure
Disorder
(35N/13E)
autworks.hms.harvard.edu
Multi-disorder component of autism (MDAG)
• 66 out of 127 involved in
at least one member of the
autism cluster
• Highly connected
component of the autism
network
Biological Process
p value
MDAG genes
transmission of nerve impulse
3.00E-11
ABAT, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2,
GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SCN1A, SLC6A4, TH, TPH1, TSC1
nervous system
development
3.29E-11
ALDH5A1, APOE, ARX, BTD, CHRNA4, DAB1, DCX, FMR1, FOXP2, GABRA5,
GATA3, GRIN2A, HOXA1, MAP2, MECP2, MET, NDN, NF1, NTF5, PAX3,
PTEN, RELN, TSC1, UBE3A, VLDLR
synaptic transmission
7.68E-10
ABAT, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2,
GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SLC6A4, TH, TPH1
cell-cell signaling
3.12E-09
ABAT, ADM, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3,
GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SCN1A, SLC6A4,
SSTR5, TH, TPH1, TSC1
brain development
2.64E-06
ARX, DAB1, DCX, FOXP2, GABRA5, HOXA1, MET, NF1, RELN, TSC1,
UBE3A
generation of neurons
2.43E-05
APOE, ARX, DAB1, DCX, MAP2, MECP2, MET, NDN, NF1, NTF5, PTEN,
RELN, VLDLR
regulation of cell
proliferation
2.45E-04
ADM, ARX, CHRNA7, DHCR7, FOXP2, GRPR, MECP2, MET, NDN, NF1,
PAX3, PTEN, SSTR5, TSC1
cell migration
3.93E-03
ARX, DAB1, DCX, MET, NDN, NF1, PAX3, PTEN, RELN, VLDLR
homeostasis
1.90E-02
ADM, APOE, ARX, CHRNA4, CHRNA7, GRIN2A, MBD1, NDN, NF1, SCN1A,
SLC40A1, SSTR5, TH
cell morphogenesis
1.94E-02
APOE, ARX, ATP10A, DCX, MAP2, MECP2, NDN, PTEN, RELN, TSC1
ion transport
2.74E-02
ARX, CACNA1D, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GRIN2A,
MECP2, MET, SCN1A, SLC40A1, TSC1
cell differentiation
4.35E-02
ADM, APOE, ARX, DAB1, DCX, DHCR7, EXT2, FXR1, GATA3, GLO1,
GRIN2A, MAP2, MECP2, MET, NDN, NF1, NTF5, PAX3, PTEN, RELN, TSC1,
VLDLR
Significantly enriched MDAG
processes
Cell Proliferation
P = 2.7E-02
P = 3.29E-11
CNS
Development
Ion Transport
P = 7.68E-10
Synaptic
Transmission
P = 2.45E-04
•Fisher’s exact test
•Bonferroni adjustment
•14648 biological processes from Gene Ontology tested
Process-Driven Predictions
Biological Processes
Autism Cluster Disorders
Putative New Genes
Fragile X
CNS development
Synaptic Transmission
Ion Transport
Cell Proliferation
…
Tuberous
Sclerosis
Seizure
Disorder
Mental
Retardation
64 new genes, all
of which occur in 2
or more of the
Autism Cluster
Disorders
Experimental Validation
• GEO6575 (from UC Davis M.I.N.D. institute)
• White blood cell Affymetrix U133plus2.0
• 17 samples of autistic children without
regression
• 18 children with regression
• 9 children with mental retardation or
developmental delay
• 12 typically developing children from the general
population
Blood for Brain
Autism without regression (17)
Autism with regression (18)
Experimental Validation
• GEO6575 (from U.C. Davis M.I.N.D. institute)
• White blood cell Affymetrix U133plus2.0
• 17 samples of autistic patients without
regression
• 18 patients with regression
• 9 patients with mental retardation or
developmental delay
• 12 typically developing children from the
general population
Data-driven approach to FDR
detection can be ineffective
• Standard data-driven application of false
discovery rate control yields few genes
below FDR threshold of 0.05. (with these
data, only 2 genes survive)
• This is a frequent circumstance in
instances of weak signal and large
background noise (e.g. microarray
experiments)
Results of process-driven search
• 43 Process-derived gene predictions had
FDR-adjusted p values <0.05
• Highly significant rate of validation -- 65%
of predictions confirmed by expression
data
Network-Driven Predictions
Results of network-driven search
•
•
•
•
•
267 occurred in 1 autism cluster disorder
58 occurred in 2
17 in 3
3 in 4 sibling disorders
A total of 345 new predictions
Results of network-driven search
• 301 had FDR-adjusted p values <0.05
• 90% (!) of predictions verified by
expression data
average distance
Prior knowledge focuses wholegenomic search
100
• 43 Process-derived gene
predictions had FDR-adjusted
p values <0.05. 65%
• 301 Network-derived gene
predictions had FDR-adjusted
p values <0.05. 90%
80
60
40
20
43
8
2
10
4
12
6
14
8
The rate of validation in both cases is significantly non-random
Top 20 genes occurring in 3 or
more Autism Sibling Disorders
For many of these candidates, their roles in neurological impairment have
been studied in autism cluster disorders, but not in autism.
Molecular Triangulation
SLC16A2
OPHN1
AR
L1CAM
FXN
Mental Retardation
Fragile X
Microcephaly
Hypotonia Rett Syndrome
Ataxia
Spasticity
Hypoxia
Tuberous Sclerosis
MYO5A
GO biological process enrichment
SLC6A8
- cytoskeleton organization
FLNA
PAFAH1B1
- cell communication
- cell organization/biogenesis
- cell motility
Conclusions
• Previous research has implicated between
100 and 1500 genes as contributors to the
molecular physiology of Autism.
• Our knowledge-driven approach provides
a logical means to filter the genome wide
search.
Conclusions
• Global “ask” swamped by noisy signal
• Informed, knowledge-driven “ask” results
in biologically significant gene predictions
• Comparative analysis of Autism with
related neurological disorders provides a
focused search for novel gene candidates
Autworks
• Autworks is a web-driven navigation
system that allows any researcher to view
and search through the network of genes
implicated in autism and related
neurological disorders
• Built to aid and abet the role of serendipity
and inspiration for researchers working on
autism and other complex neuro diseases.
• http://autworks.hms.harvard.edu
Autworks now
The Plan
• Bring our analytical strategies and Autworks
to the cloud
– Beef up underbelly using AWS storage and the
Amazon “Turkforce”
– Scale up comparative network analysis
– Enlarge validation database, verify/re-verify
computational predictions, robustify the
candidates
Aim 1: Build the neurological disease “gene core” of
the Autworks relational database
Database
Description
Database of Genomic Variants
Stats
A curated catalogue of structural variation in
the human genome
~31615 total entries (indels,
inversions, and copy number
variation)
NCBI’s central repository for both single base
nucleotide substitutions and short
deletion and insertion polymorphisms
~6,136,008 SNPs for Human
Chromosomal Variation in Man*
Searchable reference of chromosomal
variation
> 3000 links to publications
describing 30 different types of
chromosomal variation in
human disease
Human Gene Mutation
Database*
Established for the study of mutational
mechanisms in human genes
62901 mutations in public release
OMIM*
NCBI’s compendium of human genes and
genetic phenotypes
12,634 genes for ~2459
phenotypes/diseases
GeneCards*
searchable database of human genes that
provides concise genomic, proteomic,
transcriptomic, genetic and functional
information
all known and predicted human
genes with summaries of
known disease association
SNPedia shares information about the effects
of variations in DNA, citing peerreviewed scientific publications
4621 SNPs
dbSNP
SNPedia*
* Can be queried with a disease or gene term
Aim 1: Steps
(1) Extract the entire set of neurological disorders listed by NINDS
(currently 433) to ensure that we can find any and all
commonalities to Autism.
(2) Mine all databases in above Table that can be searched using a
disease term as the query, specifically the Online Mendelian
Inheritance in Man (OMIM), GeneCards, Chromosomal Variation
in Man, the Human Gene Mutation Database (HGMD), and
SNPedia.
(3) Combine and import the features from each of the online resources
into a relational database that will become the backend of
Autworks, being careful to remove any redundancies.
(4) Cross-reference resources to comprehensively populate data
model.
Gene-disease data model
“Gene Core”
This data model will
share much in
common with Variome
project’s database
Field
Description
Gene
official gene symbol from HUGO
Variant ID
unique identifier (e.g., RS#, SS#, etc.)
Variant Type
SNP, CNV, Indel, etc.
Genomic Location
chromosomal coordinates (hg build 36)
Source
Database(s) from where gene and/or
gene variant was derived
OMIM score
Confidence score used by OMIM
Polyphen score
Score indicating severity of mutation
Disease
Autism and related neurological
disorders
PubmedID
Article(s) describing the genetic variant
MeSH Major Topics
GeneTagger
Candidate
gene filtered
Medline
Medline
Medline
MeSH term
filtered
PMID: 17304222
PMID: 17173049
We identified an important component for
controlled actin assembly, abelson interacting
protein-1 (Abi-1), as a binding partner for the
postsynaptic density (PSD) protein
ProSAP2/Shank3. During early neuronal
development, Abi-1 is localized in neurites
and growth cones; at later stages, the protein
is enriched in dendritic spines and PSDs…
SHANK3 (also known as ProSAP2) regulates
the structural organization of dendritic spines and
is a binding partner of neuroligins; genes
encoding neuroligins are mutated in autism and
Asperger syndrome. Here, we report that a
mutation of a single copy of SHANK3 on
chromosome 22q13 can result in language
and/or social communication disorders...
Can we Turkify this process???
Annotator
Checks
Accuracy through
BioNotate system
Results:
Gene-Gene
Gene-Disease
Corpora
ABI1
Shank3
Shank3
Autism
Aim 2: Build interaction & network cores for
Autworks
Database
Description
Protein-Protein
Interaction
Derived directly from STRING [18]. STRING incorporates >80,000 p-p interactions
from numerous sources including MINT [24], HPRD [25], BIND [26], DIP [27],
BioGrid [28, 29], KEGG [30], and Reactome [31]. These databases contain
records from two-hybrid assays, synthetic lethality assays, mass spectrometry,
co-Immunoprecipitation, and more.
Phylogenetic
Profiles
We will take the union of evidence from STRING and evidence from RoundUP,
which was built by the PI and has greater coverage than STRING’s orthology
information (21,000+ unique phylogenetic profiles for more than 30 Eukaryotic
organisms [2]). Phylogenetic profiles are commonly used to predict functional
relationships between proteins [32, 33].
Gene Ontology
(GO)
GO [34, 35] contains >923034 unique biological process, function, and cellular
component terms. Same process, function, and/or cellular location can be used
to predict protein-protein interaction. This has been incorporated into STRING.
Co-Expression
We will combine data from STRING with our own in house Co-Expression
database, ChipperDB [23]. ChipperDB contains a sizable portion of NCBI’s
Gene Expression Omnibus [36]. Co-expression is a proven method for
predicting shared function and protein-protein interaction [37].
Bibliome
Statistically relevant co-occurrences of gene names, and semantically specified
interactions found via Natural Language Processing [16].
Network core
Interaction Core
Ataxia
GO
Co-Ex
P-P intx
Can we “cloud” it up???
Mental
Retardation
Classifier
Bibliome
Phylo-profiles
Autism
Aim 3: comparative network
analysis on the cloud
- Find disease filtered interacting partners
- Find shortest paths btw candidates
Autism
- Find minimal subnetworks
- Verify and reconstruct networks appropriately
Schizophrenia
Genetic Landscape of Autism
Rett Syndrome
Angelman
Syndrome
Mental
Retardation
Autism Diseaseome
Acknowledgments
•
•
•
•
•
•
•
Zak Kohane
Matt Huyck
Tom Monaghan
Todd DeLuca
Nieves Mendizabel
Paco Esteban
Joaquin Goni
•
•
•
•
•
Alal Eran
Michal Galdzicki
Lou Kunkel
Alexa McCray
Leon Peshkin