How You Can Benefit from the Bioinformatics Resource -

Download Report

Transcript How You Can Benefit from the Bioinformatics Resource -

How can you benefit from the
Bioinformatics Resource?
Can (John) Bruce, Ph.D.
Associate Director
Bioinformatics Resource
Keck Biotechnology Laboratory
The Bioinformatics Core
• Created within Keck Lab upon request from Yale
School of Medicine, July 2007.
• Director Hongyu Zhao Ph.D; Associate Directors Can
Bruce, Ph.D. & Yong Kong , Ph.D.
• The facility is located at Sterling Hall of Medicine.
• Commercial software packages provided free by the
Core are available to Yale researchers 24/7.
Services
• Access to large number of widely used
commercial and open source bioinformatics
programs.
• Fee-based consultation services for well
defined bioinformatics analyses.
• Collaborative projects requiring longer-term
commitment of time and effort
Available programs
• DNA/protein sequence analysis :
Lasergene and Gene Construction Kit.
• Pathway Analysis: Ingenuity Pathway
Analysis and MetaCore.
• Protein structure modeling: Sybyl, a
protein structure modeling and
visualization program.
• Mass spectrometry data analysis:
GPMAW.
• Pipelining programs: Pipeline Pilot and
VIBE
Examples of Current Collaborations
• Pathway analysis on proteomics data (Yale/NIDA
Proteomics Center Project and Yale/NHLBI
Proteomics Center Project investigators)
• Development of an algorithm for identification of
phosphorylation sites from tandem spectrometry
data (E. Gulcicek in Keck Proteomics )
• Molecular modeling of MAP Kinase ligand
interactions (B. Turk in Pharmacology)
• Sequence analysis for defining invention claim for
Office of Collaborative Research
Microarray analysis software
• GeneSpring GX, provides visualization and
advanced statistical analysis for gene
expression data.
• Partek Genomics Suite, provides advanced
statistics and interactive data visualization
designed for gene expression analysis, exon
expression analysis, promoter tiling array
analysis, chromosomal copy number analysis,
and SNP analysis.
Sequence Analysis Software
• DNASTAR Lasergene, a comprehensive suite
of programs for analysis of DNA/RNA/protein
sequences including sequence editing,
sequence assembly, sequence alignment,
primer design, protein structure prediction,
and gene detection and annotation.
• Gene Construction Kit 2.5, a tool for
designing, drawing, and annotating DNA
sequences especially plasmid constructs.
PIPELINING PROGRAMS
This pipeline from
Pipeline Pilot takes a
Swiss-Prot sequence,
from a Web portal, then
generates a results
page with four tabs,
giving summary data,
sequence features
map, chemical
structures of substrates
and blast results.
PATHWAY ANALYSIS
• MetaCore (from GeneGo),
• Ingenuity Pathways Analysis 3.1 (from Ingenuity Systems).
• Both are integrated software suite for functional analysis.
• Based on a proprietary manually curated database of human proteinprotein, protein-DNA and protein compound interactions, metabolic and
signaling pathways and the effects of bioactive molecules.
• Metacore can be integrated with other software packages such as
Genespring, Resolver, Expressionist etc. , Pipeline Pilot, EndNote,
Cytoscape.
• Ingenuity can be integrated with Genespring, Partek genomics, SAS-Jump
Genomics, Spotfire.
Why Pathway Analysis?
Pathway Creation Algorithms in
MetaCore (1)
Direct Interactions Algorithm
Draws direct
interactions between
selected objects.
No additional objects
are added to the
network
Self regulatory Networks
Finds the shortest
directed paths
containing
transcription
factors between
your genes in the
gene list.
(better used for
small number of
targets)
Expand by one
(not suitable for large collections of targets)
Auto expand
Draws sub-networks around
the selected objects, stopping
the expansion when the subnetworks intersect
Pathway Creation Algorithms in
MetaCore (2)
• Analyze Network: Creates a list of possible networks, ranked
according to how many objects in the network correspond to
the user's list of genes, how many nodes are in the network,
how many nodes are in each smaller network.
• Analyze Transcription Network similar to above, subnetworks created are centered on TFs.
• Analyze Networks (Transcription Factors) focusses on
presence of TFs at end notes.
• Analyze Networks (Receptors) focusses on presence on
Receptors at end point of a network.
Analyze Network Algorithm
Generates subnetworks highly
saturated with selected
objects. Sub-networks
are ranked by a Pvalue and
G-Score and
interpreted in terms of
Gene Ontology
A proteomics
experiment.
Effect of drug
infusion on plasma
proteins
P<1e-18
Analyze Networks (Transcription Factors) Algorithm
- an example Favors netwok
construction where the
end-nodes of
transcriptionally
regulated pathways are
present in the original
gene list.
Example from an
mRNA expression
analysis data set
comparing healthy and
lesion skin.
P=7.2e-46
Analyze Network (Receptors) Algorithm
- an example Favors network construction
where the end-point of a
pathway leads to a receptor
(through “receptor binding”)
and the starting point of a
pathway (a transcription
factor, or ligands, etc…) is
present in the original gene
list, regardless of the
presence of the end-point
receptor in the list.
Transcription Regulation Algorithm
Generates sub-networks
centered on transcription
factors. Sub-networks are
ranked by a P-value and
interpreted in terms of
Gene Ontology
13 targets/14 nodes
P=7.3e-31
Immune response: Histamine H1 receptor
signaling in immune response (p=1e-4)
GeneGo process networks
WNT signaling (p=1e-5)
Disease biomarker enrichment
Network-disease associations
1) Carcinoma (72%
coverage, p=3.3e-10)
2) Neoplasms,
connective and soft
tissue. (42% coverage,
p=8e-10)
Use of Pathway Analysis in
Candidate Gene Identification
1061 genes
are located to
mapped region for
disease
FGF2,
WNT5A,
Tenascin-C, EGF,
ILI1RN,
BDNF,
TGF-beta2, FGF2,
OSF-2,
CSPG4(NG2), IL8,
ENA-78,
GCP2,
SLIT2,
SLIT3,
Activin beta A,
Annexin I
17 receptor ligand genes
are important “input”
nodes to pathways
formed by genes with
changed expression.
Other up- or downregulated genes
360 genes up- or downregulated by >2x
Pathway analysis narrows down
number of candidate genes for disease
ErbB2
PECAM1
DDX5
BCAS3
microRNA1
RARalpha
MUL
VHR
WIP
ErbB2
NIK
Plakoglobin
HEXIM1
Prohibitin
STAT5A
STAT3
Clathrin
PSME3
PSMC5
ErbB2
FGF2,
ILI1RN,
ErbB2
Other up- or downregulated genes
360 genes up- or downregulated by >2x
These genes, from mapped region of interest, are able to form
interaction pathways going through these receptor ligands
identified by first analysis.
A caveat
Not every gene belongs to a pathway in the database…
Why Pathway Analysis Software?
• A learning tool
– Study a group of gene products.
• A data analysis tool.
– Which pathways are particularly affected?
– What disease has similar biomarkers?
• A hypothesis generation tool
– Can provide insight into mechanism of regulation of your genes.
Which is the likely causative agent for the observed changes?
What is likely to happen as a result of these changes?
– Suggest effects of gene knock-in or knock-outs.
– Suggest side-effects of drugs.
– Can highlight new phenomena that needs further investigation.
What does the program not explain?
Thank you.