From Functional Genomics to Physiological Model: the

Download Report

Transcript From Functional Genomics to Physiological Model: the

GO based data analysis
Iowa State Workshop
11 June 2009
All tools and materials from this workshop are
available online at the AgBase database
Educational Resources link.
 For continuing support and assistance please
contact:
[email protected]

This workshop is supported by USDA CSREES grant number MISV-329140.
AgBase protein annotation process
Protein identifiers or
Fasta format
GORetriever
Proteins with no
annotations
GOanna
Annotated
Proteins
GOSlimViewer
Hypothesis generating

Gene Ontology enrichment analysis
GO terms that are statistically (Fisher’s exact test)
over or underrepresented in a set of genes

Annotation Clustering
group similar annotations based on the hypothesis that
they should have similar gene members
Some resources








DAVID: http://david.abcc.ncifcrf.gov/
GOStat: http://gostat.wehi.edu.au/
EasyGO: http://bioinformatics.cau.edu.cn/easygo/
AmiGO http://amigo.geneontology.org/cgi-bin/amigo/term_enrichment
(does not use IEA)
Onto-Express & OE2GO http://vortex.cs.wayne.edu/projects.htm
GOEAST http://omicslab.genetics.ac.cn/GOEAST
http://www.geneontology.org/GO.tools.shtml
Comparison of enrichment analysis tools : Nucleic Acids Research, 2009,
Vol. 37, No. 1 1–13
(Tool_Comparison_09.pdf)
DAVID and EasyGO analysis included DAVID&EasyGo.ppt
Database for Annotation, Visualization and Integrated Discovery
http://vortex.cs.wayne.edu/ontoexpress
Onto-Express analysis instructions are
Available in onto-express.ppt
Species represented in Onto-Express
For uploading your own annotations use OE2GO
Comparison
Onto-Express , EasyGO, GOstat and DAVID
 Test set: 60 randomly selected chicken genes
 Used AgBase GO annotations as baseline
annotations

Vandenberg et al (BMC Bioinformatics, in review)
Networks & Pathways
Iowa State Workshop
11 June 2009
Multiple data analysis platforms
Proteomics
Transcriptomics
ESTs
LIST
Our original aim….
…understand biological phenomena….
Bits and pieces of information
 Do not have the full picture
 How do we get back to BIOLOGY in this
digital information landscape?

What do we know about biological
systems ….
biological systems are dynamic, not static
 how molecules interact is key to understanding
complex systems

Francis Crick, 1958
Types of interactions

protein (enzyme) – metabolite (ligand)


protein – protein


metabolic pathways
cell signaling pathways, protein complexes
protein – gene

genetic networks
STRING Database
Sod1
Mus musculus
http://string.embl.de/
Database/URL/FTP
DIP
http://dip.doe-mbi.ucla.edu
BIND http://bind.ca
MPact/MIPS http://mips.gsf.de/services/ppi
STRING http://string.embl.de
MINT http://mint.bio.uniroma2.it/mint
IntAct http://www.ebi.ac.uk/intact
BioGRID http://www.thebiogrid.org
HPRD http://www.hprd.org
ProtCom http://www.ces.clemson.edu/compbio/ProtCom
3did, Interprets http://gatealoy.pcb.ub.es/3did/
Pibase, Modbase http://alto.compbio.ucsf.edu/pibase
CBM ftp://ftp.ncbi.nlm.nih.gov/pub/cbm
SCOPPI http://www.scoppi.org/
iPfam http://www.sanger.ac.uk/Software/Pfam/iPfam
InterDom http://interdom.lit.org.sg
DIMA http://mips.gsf.de/genre/proj/dima/index.html
Prolinks http://prolinks.doe-mbi.ucla.edu/cgibin/functionator/pronav/
Predictome
http://predictome.bu.edu/
PLoS Computational Biology March 2007, Volume 3 e42
Pathways & Networks

A network is a collection of interactions

Pathways are a subset of networks
Network of interacting proteins that carry out biological
functions such as metabolism and signal transduction

All pathways are networks of interactions

NOT ALL NETWORKS ARE PATHWAYS
Biological Networks
Networks often represented as graphs
 Nodes represent proteins or genes that code for
proteins
 Edges represent the functional links between
nodes (ex regulation)
 Small changes in graph’s topology/architecture
can result in the emergence of novel properties

Yeast Protein-Protein Interaction Map
Nature 411, 2001,
H. Jeong, et al
Some resources
KEGG
BioCyc
Reactome
GenMAPP
BioCarta
http://www.genome.jp/kegg/pathway.html/
http://www.biocyc.org/
http://www.reactome.org/
http://www.genmapp.org/
http://www.biocarta.com/
Pathguide – the pathway resource list
http://www.pathguide.org/
Pathguide
Statistics
Gallus gallus is missing
Reactome
What is feasible with my specific
dataset?
Systems Biology Workflow
Nanduri & McCarthy CAB reviews, 2008
Systems Biology Workflow
For a given species of interest
what type of data is available???
Retrieval of interaction datasets


Evaluate PPI resources such as Predictome
Prolinks for existence of species of interest
If unavailable, find orthologous proteins in
related species that have interactions!
I have interactions what next?

Evaluate the quality of interactions i.e. type of
method used for identification….what exactly are
these methods?
I have interactions what next?

Evaluate the quality of interactions i.e. type of
method used for identification….what exactly are
these methods?
STRING Database
PPI Identification
Experimental
Yeast two hybrid (Y2H)
TAP assays
Gene Coexpression
Protein arrays
Computational
Phylogenetic profile
Gene Cluster
Sequence coevolution
Rosetta stone method
Text mining
PLoS Computational Biology March 2007, Volume 3 e42
PPI database comparisons
Proteins: Structure, Function and Bioinformatics 63:490-500 2006
I have interactions what next?

Evaluate the quality of interactions i.e. type of
method used for identification….what exactly are
these methods?

Visualize these interactions as a network and
analyze…
what are the available tools?