for networks - Vanderbilt Kennedy Center

Download Report

Transcript for networks - Vanderbilt Kennedy Center

Network Biology Data
Biological, Conceptual and Computational
Issues around Network, System, and
Pathway Data
The Abstract
and
The Concrete
Topic Outline
 Lessons from Genome Program and
Abstract Ideas to transform data to
information when looking at systems data.



Two examples of Concrete Tools (ready for
use)
 WebGestalt (for large sets of genes)
 Ingenuity (for networks)
A Concrete Thing: Bioinformatics Resource
Center
(under development)
Other tools under development
Human
Genome Project
(HGP):
Genome-encoded
“parts list”
as data integrator.
Past
Lessons
Directions
inproteins.
Data…
-Common
Data Elementsand
of geneFuture
and gene Products
of transcripts and
Enabling Integration and Comparison of data in NEW ways…
Individualized Genotype data
within populations
Genome Data
Phenotype and
System Data
GeneKeyDB and related
work as an integrative
foundation that can help
merge with other data.
HGP Highlighted some ways to succeed or
fail with large data sets.
? Lessons Learned applicable for systems bio of
expression, proteomics, genetic data sets? Yes.
?But, are some new approaches needed to understand
SYSTEM data? Yes.
Genome Data`
Biggest Lesson: A Biodata item has 2 questions
attached to it…Mayr…HGP showed importance of the why
questions in thinking about and organizing data.
Other genotype, phenotype, system data
Genome Data
A datum…
How?
Why?
HGP results and Future Issues for new data….
Genotype + Environment + DEVELOPMENT
==> Phenotype
1) Astounding Results Importance of Network
thinking in development and physiology for data to
explain phenotype (e.g. PAX6)
2) Some relevance from HGP data approaches,
but…Need new bioinformatics tools for network data
and thinking…
Δ data in
Cellular
signaling
networks
Δ data in Regulatory networks
Δ data in protein
coding
Δ data in
Cellular
signaling
A waynetworks
of thinking about data…
Bioinformatics: Finding the (genotypic,
environmental data) difference that makes
the (phenotypic data) difference.
(Many differences that make an interesting difference,
NOT at protein coding, but at complex networks)
Δ data in Regulatory
networks
Δ data in protein
coding
What is a “Network” way of viewing data…
Edges or Lines
Nodes or Vertices
may be
• Undirected vs. directed
• Weighted vs. unweighted.
May be
• Genes
• Gene products
• Hormones, signals
• Metabolites
• Publications
• Functional Sequence
Elements
Could be…
• Co-expression Networks
A• Biological
network
Gene Regulatory
networks can
expressed
andand signal
• be
Cell-Cell
communication
manipulated
in terms of
transduction networks.
theory.”
• “graph
Phylogenetic
relationships among
Combinatorial
algorithms
genes, species, networks:
orthology,
are
needed
to analyze
paralogy,
etc. (trees,
clades, etc.)
. or other Directed 0.9+
• graphs
Gene Ontology
Acyclic Graphs.
+
1.7
+
1.2
e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books.
Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101
What is a “Network” way of viewing data…
Nodes or Vertices
May be
• Genes
• Gene products
• Hormones, signals
• Metabolites
• Publications
• Functional Sequence
Elements
Edges or Lines
may be
• Undirected vs. directed
• Weighted
vs. unweighted.
Tightly connected
Experimental
correlation
modules might
be can
A• Biological
network
be expressed
and vs.
(can
befound…
undirected)
manipulated
in
terms
of
Might be loosely analogous
to
mechanistic
&
directed
“graph
theory.”
a protein sequence module
Combinatorial
algorithms
that is conserved, duplicated,
areandneeded
to analyze
diverged. Might
see
similarity. across different
graphs
tissue, species, etc.
+
0.9
+
1.7
+
1.2
e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books.
Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101
Data Storage &
Collaborative
Bioinformatics
Existing
Knowledge
Large Molecular
data sets
Phenotype
Data
GeneKeyDB
Microarray data,
proteome, etc.
MuTrack
Genetic
Data
WebQTL Williams et al UTHSC
data integration (via GeneKEyDB,
BioFoundation) and
NeedGene-centered
to
collaborate,
integrate,
Comparative, Boolean, other operations on Gene Sets & Networks
WebGestalt and Ingenuity
two examples
COMPARE
to are
find
differences in
biological NETWORKS.
Integrative
Bioinformatics
Genotype & Phenotype
Data Sets
Collaborative, Integrative, and Comparative
Bioinformatics
Data
Comparative Visualization
Bioinformatics
& Data Mining & Stats
Comparative
Cladistic
Phylogenetic
Analysis
Graph
Algorithms
Sequence
and
Network
Modularity
Network
Analysis
CS, Stats,
Network
modules:
Duplicated
Diverged
Converged
WebGestalt Web-based Gene Set Analysis
Toolkit http://bioinfo.vanderbilt.edu/webgestalt
Bing
Zhang
Can upload gene
sets based on
1)IDs (e.g. affy, locus link,
protein IDs from chip,
proteome, etc.)
2) Genome
Location
Or…
3) Gene Ontology
(common biological process,
molecular function, cellular
location)
Manipulate data, as set of genes
or gene products
RNA expression, proteome, genomics, statistical
genetics, etc. all produce list of genes that may
function in a network.
1 of 3 things to do
Boolean operations on multiple
sets or retrieving orthologs.
2 of 3 things to do
Retrieve Data and other IDs
1 of 3 things to do
3rd thing to do
“Unusual” Properties across set
e.g. What GO
(biological processes,
molecular functions,
and cellular locations)
are in the set? Are they
any that seem to occur
more than than
expected…
Co-occurrence of genes and
publications (GRIF)
Protein Domains in set
Chromosome locations in set…
Pathways in set (1)
Pathways in set (2)
Ingenuity
 A commercial tool for manipulating
graphs (networks).
VU License
http://bioinfo.vanderbilt.edu/wiki/Ingenuity
 (Also some open source tools,
cytoscape, GeNetViz, etc. )
Use of
Commercial
tool,
Ingenuity by
Dr N.
Deanne and
Dr.
Beauchamp
Pathways (3)
Bioinformatics Resource Center
 Developing a Bioinformatics Resource Center (BRC) that will
consist
 Training infrastructure and applied workshops
 Support faculty using existing tools and databases (CaBIG, custom
statistical packages, NCBI genomics, imaging,molecular structure
resources).
 Collaborative IT
 Establish accessible databases in shared cores and support faculty
using these resources. …
 Integrative IT
 Web sites that integrate information from disparate data sets:
 Comparative IT
 Systems biology: comparing data across multiple platforms to identify
new patterns—tissues and cells, molecular pathways, model
organisms, toxins, etc
(taken from VUMC Strategic Plan).
Other systems…
 Construction projects that can be
further formed by your needs…





CollabCore and Lab Blogs
Genepedia,
GeneKeyDB, BioFoundation
Extensions to Webgestalt
TFCAT, GeneCAT, CladeCAT, Pazar
Acknowledgments
Bing Zhang
Stefan Kirov
Leslie Galloway
Barbara Jackson
Betty Lou Alspaugh
Oakley Crawford
Suzanne Baktash
Xinxia Peng
Harold Shanafield
Sam Wang
Adam Tebbe
Shawn Ericson
Jeff Horner
A few collaborators…
Bonnie LaFleur
Shawn Levy
Phil Dexheimer
Michael Langston
CS collaborator
Wyeth Wasserman
Dan Goldowitz
and the TMGC
Rob Williams et al
WebQtl, etc.
Erich Baker
Dan Beauchamp
Natasha Deanne
Chad Johnson