Transcript Genotype
Towards an understanding of
Genotype-Phenotype correlations
Paul Fisher et al.,
Genotype
The entire genetic identity of an individual
that does not show any outward
characteristics, e.g. Genes, mutations
Genes
DNA
Mutations
ACTGCACTGACTGTACGTATATCT
ACTGCACTGTGTGTACGTATATCT
Phenotype
The observable expression of gene’s
producing notable characteristics in an
individual, e.g. Hair or eye colour, body
mass, resistance to disease
vs.
Brown
White and Brown
Genotype to Phenotype
Genotype
Current Methods
Phenotype
200
?
What processes
to investigate?
Phenotype
Genotype
200
?
Metabolic pathways
Phenotypic response investigated
using microarray in form of
expressed genes or evidence
provided through QTL mapping
Genes captured in microarray
experiment and present in QTL
(Quantitative Trait Loci ) region
Microarray + QTL
The Pathway approach
Genotype
Phenotype
Pathway(s)
• Obtain a global view of what is happening in the phenotype
• Pathways allow for experimental verification in the lab
• Provides a driving force for functional discovery
Phenotype
Pathway A
CHR
literature
QTL
Pathway linked to
phenotype – high
priority
Gene A
Pathway B
Gene B
literature
Gene C
Pathway not linked
to phenotype –
medium priority
Pathway C
Genotype
literature
Pathway not linked
to QTL – low priority
Issues with current approaches
Huge amounts of data
QTL region on
chromosome
Microarray
200+ Genes
1000+ Genes
How do I look at ALL the
genes systematically?
Hypothesis-Driven Analyses
200 QTL genes
Case: African
Sleeping sickness
- parasitic infection
- Known immune
response
Pick the genes involved in
immunological process
40 QTL genes
Pick the genes that I am most
familiar with
2 QTL genes
Result: African
Sleeping sickness
-Immune response
-Cholesterol control
Biased view
-Cell death
Manual Methods of data analysis
Tedious and
repetitive
No explicit
methods
Human
error
Navigating through
hyperlinks
Implicit
methods
Issues with current approaches
•
Scale of analysis task
•
User bias and premature filtering
•
Hypothesis-Driven approach to data analysis
•
Constant flux of data - problems with re-analysis of data
•
Implicit methodologies (hyper-linking through web pages)
•
Error proliferation from any of the listed issues
So what do we want to do?
•
Decrease scale of manual analysis task for user
•
Limit user bias
•
Remove premature filtering
•
Data-driven approach to hypothesis generation
•
Analyse the data whenever I want or after an update
•
Create explicit methodologies that can be re-used
•
Reduce the overall errors
Solution – Automate using workflows
PhD - Hypothesis
Utilising the capabilities of workflows and the pathway-driven
approach, we are able to provide a more:
- systematic
- explicit
- scalable
- un-biased
the benefit will be that new biology results will be derived,
increasing community knowledge of genotype and phenotype
interactions.
Genomic
Resource
QTL mapping
study
Microarray gene
expression study
Identify genes in
QTL regions
Identify differentially
expressed genes
Annotate genes with
biological pathways
Pathway
Resource
Annotate genes with
biological pathways
Select common
biological pathways
Wet Lab
Hypothesis generation
and verification
Literature
Statistical
analysis
Replicated
original chain of
data analysis
Steve Kemp
Andy Brass
+ many Others
Trypanosomiasis in Africa
http://www.genomics.liv.ac.uk/tryps/trypsindex.html
Results
A strong candidate gene was found
– Daxx gene not found using manual investigation methods
– The gene was identified from analysis of biological pathway
information
– Possible candidate identified by Yan et al (2004): Daxx SNP info
– Re-sequencing of the Daxx gene identified mutations
– Mutation was published in scientific literature,
– affect on the binding of Daxx protein to p53 protein
– p53 plays direct role in cell death and apoptosis, one of the
Trypanosomiasis phenotypes
Shameless Plug!
A Systematic Strategy for Large-Scale Analysis of
Genotype-Phenotype Correlations: Identification of
candidate genes involved in African Trypanosomiasis
Fisher et al., (2007) Nucleic Acids Research
PubMed ID: 17709344
• Explicitly discusses the methods we used for the Trypanosomiasis use case
• Discussion of the results for Daxx and shows mutation
• Sharing of workflows for re-use, re-purposing
Recycling, Reuse, Repurposing
Here’s the e-Science!
•
•
•
•
•
•
•
•
Trypanosomiasis mouse workflow reused without change
in Trichuris muris infection in mice
Identified biological pathways involved in sex dependence
Previous manual two year study of candidate genes had
failed to do this.
More to follow with additional data
Additional workflows constructed for looking at cattle and
human
Used mouse workflows as basis for development
1 web service changed in entire workflow (BioMart)
Exactly the same methods
Recycling, Reuse, Repurposing
• Share
• Search
• Re-use
• Re-purpose
• Execute
• Communicate
• Record
http://www.myexperiment.org/
Prove your methods
can be replicated
…. and share to get
recognition for your work
What next?
• More use cases for QTL and microarray
– African Trypanosomiasis
– Trichuris muris
– Possibly Lung cancer ???
• Text Mining !!!
– Aid biologists in identifying novel links between
pathways
– Link pathways to phenotype through literature
Genomic
Resource
QTL mapping
study
Microarray gene
expression study
Identify genes in
QTL regions
Identify differentially
expressed genes
Annotate genes with
biological pathways
Pathway
Resource
Annotate genes with
biological pathways
Select common
biological pathways
Wet Lab
Hypothesis generation
and verification
Literature
Statistical
analysis
Phenotype
Pathway A
CHR
literature
QTL
Pathway linked to
phenotype – high
priority
Gene A
Pathway B
Gene B
DONE MANUALLY
literature
Gene C
Pathway not linked
to phenotype –
medium priority
Pathway C
Genotype
literature
Pathway not linked
to QTL – low priority
It can’t be that hard, right?
• PubMed contains ~17,787,763 journals to date
• Manually searching is tedious and frustrating
• Can be hard finding the links
Computers can help with data gathering
and information extraction – that’s their
job !!!
What Does the Text Hold?
Protein Info
Related
Proteins
Protein-Protein
Interactions
Pathways
Biological
processes
What Next ?
Biological
processes
Generate a Profile for
Pathway / Phenotype
Apoptosis
Cell Death
Stress response
……..
Score and Rank Terms
Common terms
Apoptosis
Cell Death
JNK pathway
Phenotype Terms
13.27
Apoptosis
28.35
Score pathway
links based on
occurrence of
phenotype term
in pathway
abstracts
Apoptosis
Cholesterol
Diabetes
Apoptosis
JNK pathway
Another pathway
0.15
Apoptosis
Cholesterol
JNK pathway
The Workflows
Steve Kemp
Andy Brass
+ many Others
Trypanosomiasis in Africa
http://www.genomics.liv.ac.uk/tryps/trypsindex.html
Preliminary results – a preview
• Glycolysis, reactive oxygen species, alternatively activated
macrophages
Parasite
Sample of ranked
workflow results
glycolysis
ATP
antimycin
glycolytic enzymes
apoptosis
reactive oxygen
oxidative stress
glycolytic intermediates
H2O2
TH1
156.87
107.24
102.53
93.27
89.17
85.02
80.25
67.31
64.02
macrophage
Reactive oxygen
species (NO)
Glycolysis
TH2
Alternative
macrophage
N.B. It’s not as linear as this !!!
IFN-Gamma
Text Mining
• A means of assisting the researcher
– Time
– Effort
– Narrow searches
• Hypothesis generation and verification
– Suggested links
– Limited corpus, but its specific
NOT A REPLACEMENT FOR
DOMAIN EXPERTISE
The Final Result
Genotype
Phenotype
Pathway(s)
Tools (workflows) to allow easier transition
between genotype and phenotype
Many thanks to:
including: Joanne Pennock, EPSRC,
OMII, myGrid, and lots more people