training material for the Pathway Studio Enterprise edition

Download Report

Transcript training material for the Pathway Studio Enterprise edition

Pathway Studio
Workgroup/Enterprise training
course
©2006 Ariadne Genomics. All Rights Reserved.
DAY 1
Technology overview
System architecture
©2006 Ariadne Genomics. All Rights Reserved.
Products
• Pathway Studio Desktop
• Pathway Studio Workgroup
• Pathway Studio Enterprise
Main functionality:
1) Data mining and pathway building
2) Analysis of high-throughput data
3) Text-mining and fact extraction
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
3
Ariadne Corporate Offering
Software solution for Knowledge management and pathway analysis of the high-throughput data
MedScan
1000 abstracts/min
Proprietary
data
Public
interaction
data
Knowledge
Databases
Pathway Building
Pathway collection
ResNet
Biological Association
Networks
Analysis of HighThroughput data
©2006 Ariadne Genomics. All Rights Reserved.
Text-mining
©2006 Ariadne Genomics. All Rights Reserved.
4
Accomplishments (April, 2007)
•
•
•
•
•
•
•
•
•
•
188 publications using AGI software and ResNet database
Gene expression microarray analysis (105)
Pathway Analysis (80)
Disease mechanism (64)
Human genetics (7)
Publication by Ariadne Authors (13)
Text processing (9)
Reviews (6)
Databases (3)
Drug discovery (16)
Toxicogenomics (4)
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
5
Read-only
users
Pathway Studio Workgroup
client-server architecture
Database
Data curators
PSW administrator
Third party tools,
in-house applications,
API
SQL interface,
bulk data
management
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
6
PathwayExpert Architecture
Read-only users
via web browser
Application
server
Database
Data editors
via web browser
Third party tools,
in-house applications,
API
Bioinformaticians via
Pathway Studio
SQL interface,
bulk
data
©2006 Ariadne Genomics.
All Rights
Reserved.
management
©2006 Ariadne Genomics. All Rights Reserved.
7
“Everyone is an Expert” decentralized deployment schema
Hundreds or thousands of users some with read only and some
with editor or publishers roles accessing one central database
via Pathway Studio and/or Web browser to analyze
experiments, browse pathway collection, do literature mining,
sharing the data and analysis results.
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
8
“Bioinformatics service group” centralized deployment schema
Bioinformatics group servicing scientists for entire company by
analyzing their experimental data and literature mining. Analysis
results are published via Web browser interface for end users
End
users
View only access to pathways and
analysis networks annotated with
experimental data via web browser
and links to PathwayExpert Web
Services
1) Experimental data
2) Search requests
Bioinformatics group
1) Analysis of experimental data
2) Text-mining and Pathway
©2006 Ariadne Genomics. All Rights Reserved.
Building
©2006 Ariadne Genomics. All Rights Reserved.
9
“Disease area” decentralized clusters deployment schema
Disease area groups have bioinformatics, biologists and chemists working as a
team with focus on one disease
Cardiovascular group
Digestive disorders group
Cancer group
CNS group
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
10
Day 1
Introduction to MedScan technology
©2006 Ariadne Genomics. All Rights Reserved.
Ariadne MedScan Text-To-Knowledge Technology
Extracting biological association networks from text
MedScan
1000 abstracts/min
MedScan
output:
RNEF XML
Pathway Studio to
navigate knowledgebase
Pathway Analysis in
ResNet database
Knowledge
Databases
ResNet
Biological Association
©2006 Ariadne Genomics.
All Rights Reserved.
Networks
©2006 Ariadne Genomics. All Rights Reserved.
12
How MedScan extracts facts from text?
• Sentence in PubMed:
“Axin binds beta-catenin and inhibits GSK-3beta.”
• Identify Proteins in Dictionary (in red):
“Axin binds beta-catenin and inhibits GSK-3beta.”
• Identify Interaction Type (in black):
“Axin binds beta-catenin and inhibits GSK-3beta.”
Syntactic Layer Noun Phrase
Verb Phrase
Noun Phrase
Semantic Layer Protein
Protein
Relations
Protein
• Extracted Facts:
Axin - beta-catenin
Axin -> GSK-3beta
relation: Binding
relation: Regulation, effect:©2006
Negative
Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
13
Describing MedScan
•
•
•
•
Manually curated: dictionaries and grammar rules
Fast: 14 mln PubMed abstracts in 2 days on modern PC
Comprehensive: facts recovery rate > 90%
Removes redundancy: 7,647,282 non-distinct
relations =>1,000,000 distinct relations
• Accurate: false positive rate – 10%
• Customizable: dictionaries and patterns
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
14
MedScan Architecture
Customizable by user
Modules
Dictionaries
Toxicology
Plants
C-elegans
Drosophila
Pattern
matcher
Relationship
extraction
Yeast
Patterns
Semantic
processor
Entity
detection
Mammals
Rules
Entity
recognizer
RNEF
XML
Cartridges
Future:
•New modules: ConceptScan
©2006 Ariadne Genomics. All Rights Reserved.
•New cartridges: Immunology, Clinical
©2006 Ariadne Genomics. All Rights Reserved.
15
Overview of MedScan Architecture
Input Text
Protein names
dictionary
Preprocessor
Tagged Sentences
Lexicon
Pattern Matcher
Tokenizer
Sequence of Words
Sentence Structure
Semantic Interpreter
Semantic tree
Extraction
rules
Identifies proteins and
small molecules
Context-free grammar
Syntactic Parser
Grammar
Dictionary-based
Ontological interpreter
Grammar and lexicon
are proprietary.
They are domainindependent by design
but focused on
biomedical field.
Rule-based
Rules are equivalent
to ontology
Converter
Extraction patterns
Extracted facts
Database
of relations
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
16
MedScan Applications
Pubmed
Indexing the
scientific literature
Entity-based index
Semantic Index
Google
MedScan
Open access
Extracting interactions to create
databases for systems biology
Automatic reader’s digest
Document Summary
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
17
Text-mining tools in Pathway Studio
• Tools -> Start MedScan Reader
–
–
–
–
–
Web-browser enhanced with MedScan technology
Search PubMed and manually select abstracts for fact extraction
Search Google Scholar and extract facts from top 100 hits
Search Google and extract facts from top 30 hits
Search Highwire and BioMed Central and extract facts from the
individual full-text articles
• Tools -> MedScan: Extract pathways from text
– search PubMed
– from file
– from location
• Tools -> Update pathway
• Tools -> Pathway Reference summary
– Export to EndNote
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
18
Medscan Reader settings
1) Specifying MedScan
cartridge
2) Tracking favorite
entities via highlight
3) Filtering for favorite
entities and relations
4) Filtering against
entities and relations
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
19
Day 1
Ariadne ResNet database construction
©2006 Ariadne Genomics. All Rights Reserved.
ResNet Mammal Database
• Shipped with >1,000,000 unique relations derived by Medscan
between proteins, metabolites, chemicals, cell processes and
diseases
• ResNet physical interactions are manually curated
• 712 manually curated pathways
• Gene Ontology
• Optional pathway updates:
– >300 Regulome pathways
– >2500 Biological processes pathways
– >200 Cellular component pathways
– High-throughput interaction data
• ResNet automatically curation is possible to remove
redundancy and
©2006 Ariadne Genomics. All Rights Reserved.
cleanup false positives
©2006 Ariadne Genomics. All Rights Reserved.
21
Pathways collection in ResNet
•
•
•
•
•
•
•
•
•
Canonical pathways (included, curated)
Signaling line pathways (included, curated)
Regulome pathways (optional, automatic)
Biological processes pathways (optional, automatic)
Cellular component pathways (optional, automatic)
KEGG metabolic pathways (optional, imported)
STKE (commercial)
Metabolic vision (commercial)
PathArt (commercial)
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
22
Ariadne databases for other organisms
All databases contain:
- Relations extracted by MedScan organism-specific cartridge from organism-specific abstracts and full-text
articles
- Entrez Gene protein annotation
- Protein interactions from Entrez Gene (include BIND, HPRD, BioGRID and EcoCyc datasets)
- Gene Ontology annotation
Model Organism databases:
• ResNet Plant >400,000 relations, supports 6 plant species
– Optional entity co-occurrence data
– Additional protein physical interactions predicted by TAIR
• ResNet Drosophila
– Additional interactions from published high-throughput datasets
• ResNet C-elegans
– Additional interactions from published high-throughput datasets
•ResNet Yeast
– Additional interactions from published high-throughput datasets
•ResNet Bacteria (beta version)
– Additional interactions from published high-throughput datasets
©2006 Ariadne
Rights Reserved.
Databases for non-model organisms containing interactions predicted
fromGenomics.
closestAllmodel
organism
are available from: http://www.ariadnegenomics.com/support/downloads/databases/
©2006 Ariadne Genomics. All Rights Reserved.
23
Additional Commercial Datasets
• KEGG: > 130 metabolic pathways from Kyoto U-ty
• STKE: > 70 pathways from AAAS
• Metabolic vision: >10,000 curated pathways for 587
organisms from Integrated Genomics Inc
• Hynet: adds over 100,000 new protein physical
interactions to ResNet 5.0 from Prolexys Inc
• PathArt: >600 disease pathways from Jubilant Inc
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
24
Day1
Pathway Studio maintenance and
administration and technical support
©2006 Ariadne Genomics. All Rights Reserved.
Hardware requirements for Pathway Studio
• Pathway Studio desktop or workgroup client
–
–
–
–
CPU: 2 GHz or more
RAM: 512 MB or more
Disk space for application: 500 MB
Disk space for one local database: 2 GB
• PathwayStudio workgroup server
–
–
–
–
–
–
1 CPU for 1-5 concurrent users: : >3.0 GHz
2 CPU for 6-10 concurrent users: >3.0 GHz
RAM for 1-5 concurrent users: >2 GB
RAM for 6-10 concurrent users >3 GB
Disk space : 20 GB for the database
Optimal disk configuration:
• for 1-5 concurrent users: 4 hard drives in RAID 0©2006 Ariadne Genomics. All Rights Reserved.
• for 6-10 concurrent users: RAID 10 mode
©2006 Ariadne Genomics. All Rights Reserved.
26
Pathway Studio software requirements
• Pathway Studio desktop or workgroup client
– Microsoft Windows Server (2000,2003), Windows XP
(Professional), Windows Vista (Professional, Ultimate,
Corporate)
• PathwayStudio workgroup server
– MS SQL Server 2000 or 2005 (Developer, Workgroup,
Standard or Enterprise Edition) on Windows 2000, Windows
2003 Server, Windows XP Professional
– Oracle 10g or later on any supported Oracle platform
including Windows 2003 Server, Linux, etc.
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
27
Connecting to the central workgroup database
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
28
Connecting to the server enterprise database
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
29
Database Index folder
• Database statistics
• Viewing entities in the list
pane
• Viewing pathways
• Viewing groups
• Expression experiments
folder
• Simulation model folder
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
30
PS Workgroup Admin console
User roles in Workgroup environment
• Administrator
• Editor – can edit public objects
• Publisher – can publish private pathways
• Regular user – can work only in his private space
Ask your PSW administrator to get an account and choose your role
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
31
Ariadne Technical Support
http://www.ariadnegenomics.com/products/support.html
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
32
Summary of the introduction slides
• Medscan technology
• Software architecture, hardware and software
requirements
• User roles
• ResNet database overview
• Ariadne’s technical support
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
33
Summary for the rest of the day
•
•
•
•
•
•
•
Working with objects in database
Working with pathway diagram and layout algorithms
Database search in PS
Build pathway tool and strategy
Data import/export
Pathways in ResNet
Pathway comparison and statistical algorithms Find
groups/pathways
• Text-mining in PS
• Microarray analysis: data import options and algorithms
• Pathway kinetics simulation in PS
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
34
DAY 1
Pathway Building in Pathway Studio
• Manual
• Automatic using Graph navigation tools
• Using text-mining with MedScan
©2006 Ariadne Genomics. All Rights Reserved.
Viewing and editing pathways in Pathway Studio
•
•
•
•
•
•
•
•
Viewing entities in the List Pane
Entity and relation tables
Show all references
Pathway Reference summary
Export protein list
Display styles: By type, By effect, By reference count
UI options:
– magnifier
– fit text to entities
– simple and full graph view
– fit to window
– rotate
– move
– zoom by rectangle
– advanced graph scaling
resizing nodes in pathway pane
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
36
Finding entities and relations in Pathway
Studio database
• Quick search
• String search
• Search by attribute
• Build pathway tool
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
37
Viewing and editing entity/relation properties
Edit Entity property dialog, URN identifier
Links to external databases
Adding new properties, Declaring new properties in the database
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
38
Palette pane
• Making a figure legend for
your publication
• Viewing group display styles
• Drag & drop entity icon into
pathway pane
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
39
Images pane
• Drag & drop images into
pathway pane
• Importing your own images
• Image properties
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
40
KEGG pathways layout
node cloning in pathway graph
• 131 metabolic pathways
• 20,972 connected proteins
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
41
Several methods for adding objects and relations to
Pathway pane
Adding objects:
• Drag & drop from the palette
• Drag & drop from the list pane
Adding relations:
• Connect selected entities button
• Enter a fact box
• Drag & drop from the list pane
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
42
Building pathways by manual
curation in Pathway Studio
In GeneMapp
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
In Pathway Studio
43
Building pathways by manual curation in
Pathway Studio
• Complex Nodes
• Adding components
to Complex Nodes
In Pathway Studio
In GeneMapp
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
44
Questioner about the previous slides
• How many chemical reactions in the ResNet
database?
• What is the default image for Transcription
factor in PS?
• How many images for cell membrane can be in
PS?
• What is the quickest search in PS?
• What is the quickest way to add relation to your
pathway diagram?
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
45
DAY 1
Automatic Pathway Building using
Graph navigation
Build pathway tool
©2006 Ariadne Genomics. All Rights Reserved.
Mining regulatory relations in database
Basic principal:
Regulatory interactions are mediated by physical interaction network
– Regulomes
– Biological processes pathways
– Disease pathways
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
47
Build Pathway dialog
The main application of the Build pathway tool is to quickly
find connections between entities of interest therefore its
button is available from all panes:
•Build pathway options
•Filtering by direction
•Number of steps
•Build pathway filter
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
48
Build pathway filters
• Using entity filters to answer
different biological questions
• Using relation filter to analyze
different types of highthroughput data
• Filtering by properties
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
49
Build pathway Edit Results
• Display filtering
• Selecting results based on local connectivity
• IsNew column
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
50
Automatic layout options
• Direct force layout
– charges and springs
– Good to find hubs in the pathway
• Hierarchical layout
– Directed graph
– Good for metabolic pathways (KEGG, ERGO)
• Symmetric layout (Centric graph)
– Good for Expand pathway
• Cell localization layout (Circular and linear membrane)
Configurable:
– Cell localization annotation
– Organelle images layout
– Association of Cell localization value and Organelle image
• Dynamic layout
– Direct-force like with adjustable spring force
– Use cell localization if organelle
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
51
Regulome pathways: algorithm input
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
52
Regulome pathways: algorithm result
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
53
Building pathways by Data mining
converting regulatory network to protein physical interaction network for Cell Processes, Diseases, Regulomes
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
54
Disease networks
2300 diseases, 230 cancers in ResNet 5.0
Entities associated with Endothelial cells cancer in ResNet
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
55
Endothelial cells cancer network
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
56
Data-mining techniques and hints
• Different filter settings – different biological questions.
Know the relation type meaning
• Directional filter to perform upstream/downstream
analysis
• Relaxing search by including the Regulation relations
• To mine for more specific relations use search
Relation by Sentence include “your focus keyword”
– Find relation mentioned in certain tissue
– Find specific mechanism: trans-activation, cleavage etc…
• Filter by relation confidence using Relation table to
increase network confidence
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
57
DAY 1
Build pathway settings asking
different biological questions
©2006 Ariadne Genomics. All Rights Reserved.
Finding major regulators among DE genes
Third choice for expression data
3
First choice for expression data 1
2
Second choice for expression data
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
59
Upstream analysis of DE genes and gene clusters
2
1
3
Third choice for expression data
First choice for expression data
1
2
3
Second choice for expression data
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
60
Analysis of proteomics co-IP data
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
61
Analysis of proteomics phosphoprofiling experiments
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
62
Analysis of metabolomics experiment
2
1
3
2
Importing metabolomics 1
experiment
©2006 Ariadne Genomics. All Rights Reserved.
3
©2006 Ariadne Genomics. All Rights Reserved.
63
Relaxing Build pathway settings
•
•
Replace Find only direct interactions by
Find shortest path
Increase Maximum number of steps in Find
common regulators or in Find shortest path
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
64
Day 1
Pathway Building by text-mining
Non-melanoma skin cancer
>1,000,000 cases, (<2,000 deaths), in USA
©2006 Ariadne Genomics. All Rights Reserved.
MedScan Reader: PubMed search
Keep searching and
adding relations
At the end Send
extracted relations to
Pathway Studio
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
66
MedScan Reader: Import top 100 Hits from Google Scholar search:
downloads found articles and processes them with MedScan
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
67
MedScan Reader: Import top 30 Hits from Google search: downloads
found web-pages and processes them with MedScan
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
68
Full-text article found on Highwire press with “non-melanoma skin cancer” text
search
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
69
“Non-melanoma skin cancer” literature network – result
of text-mining by MedScan Reader
Every entity in this network
was mentioned in the context
of non-melanoma skin cancer
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
70
Protein interaction network for non-melanoma skin
cancer using information from entire ResNet
Compare this pathway with
©2006 Ariadne Genomics. All Rights Reserved.
your
experimental patient data
©2006 Ariadne Genomics. All Rights Reserved.
71
Text-mining techniques and hints
controlling relevance of literature networks
•
•
Search with keywords for full-text articles
and subsequent MedScan fact extraction
loosely associates keywords with facts: you
find all facts mentioned in the one article
with your keywords
Search with keywords for PubMed abstracts
and subsequent MedScan fact extraction
provides better relevance of the extracted
facts to your keywords: you find all facts
mentioned in the one abstract with your
keywords
Search with keywords for sentences
extracted by MedScan provides the most
relevant relevance of the extracted facts to
your keywords: you find all facts mentioned
in the one abstract with your keywords
Relevance Vs. Recovery
120
100
80
%
•
Relevance
60
Recovery
40
20
0
full-text
©2006 Ariadne Genomics.
abstract
senetenceAll Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
72
DAY 1
Data Import/Export
©2006 Ariadne Genomics. All Rights Reserved.
Tools ->
Import Protein List
•
•
•
•
Choice of identifiers
Lookup preview
Paste and Load from file
Import as New group of proteins
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
74
Tools -> Import Protein Network
•
•
•
•
Choice of identifiers
Lookup preview
Paste and Load from file
Import of Regulatory relations
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
75
Importing Chip-On-Chip data as PromoterBinding relations using Tools->Import Protein
Network
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
76
Import creates a new pathway with new relations
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
77
Database ->
Import Wizard
•
•
•
•
Importing from Internet
Import formats and options
Specifying source for entities and relation
Specifying source folder for pathways
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
78
Database ->
Export Wizard
•
•
•
•
Exporting pathways
Export filters
Export strategy
©2006 Ariadne Genomics. All Rights Reserved.
Exporting entities annotation in Plain text format
©2006 Ariadne Genomics. All Rights Reserved.
79
DAY 1
Data management, pathway
comparison, find groups/pathways
©2006 Ariadne Genomics. All Rights Reserved.
Working with groups in Pathway Studio
•
•
•
•
•
Create group
Add Entities to a group
Add group as a node into pathway pane
Select/Highlight by group
Maintaining group hierarchy
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
81
Edit -> Combine Pathway
• Union
• Intersection
• Subtract
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
82
Tools for pathways comparison in Pathway Studio
• Combine pathways
• Select
©2006 Ariadne Genomics. All Rights Reserved.
• Highlight
©2006 Ariadne Genomics. All Rights Reserved.
83
Statistical algorithms for pathway comparison
in Pathway Studio
• Find Pathways
• Find Groups
• Gene Ontology analysis
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
84
DAY 2
Analysis of high-throughput data
in Pathway Studio
©2006 Ariadne Genomics. All Rights Reserved.
Experiment types
• Gene expression
– Find major regulators
– Find biomarkers
– Gene clustering
• Metabolomics
– Find major metabolism regulators
– Combined analysis with gene
expression
• Proteomics
– Mass-spec protein level
– Finding major kinases/phosphatase
for phosphoprofiles
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
86
Data model in ResNet database
Use different networks for different types of experimental data
Expression
Interpretation of Gene
Expression data
PromoterBinding
DirectRegulation
Interpretation of
Proteomics data
ProtModification
Binding
Interpretation of
Metabolomics data,
Biomarkers prediction
and validation
MolSynthesis
MolTransport
©2006 Ariadne Genomics. All Rights Reserved.
Regulation
…MORE….
©2006 Ariadne Genomics. All Rights Reserved.
87
Analysis of gene expression microarray data: import
and selection of responsive genes
• Data import
– Tab-delimited and Excel files
– Affymetrix CEL files (with RMA normalization)
– GenePix (GPR)
Result: Save the experiment in the Expression favorites
• Selection of responsive genes
– Find differentially expressed genes (significance analysis via t-test) for
analysis of two samples measured in multiple replicas
– Gene clustering via correlation networks (Pearson correlation)
– Find responsive genes in the 3d party software for statistical analysis of
microarray data and import it as a list (Tools->Import protein list)
©2006 Ariadne Genomics. All Rights Reserved.
Result: save as group of genes in Groups folder
©2006 Ariadne Genomics. All Rights Reserved.
88
Analysis of gene expression microarray data:
Pathway Analysis
•
Network analysis
– Identification of DE expressed protein complexes and physical networks
• Build pathway: Find direct regulation, filter for physical interactions (Binding, DirectRegulation,
ProtModification)
• Build differentially expressed networks, filter by Binding (PS Enterprise only)
– Identification of major regulators and targets in expression network:
• Build pathway: Find direct regulation, filter for Expression and/or PromoterBinding interactions, use
hierarchical layout
• Find significant regulators (network enrichment analysis) filter by Expression, PromoterBinding (PS
Enterprise only)
Result: save as pathway
•
Functional analysis
– Find groups/pathways
• Gene ontology analysis
• Comparative gene ontology analysis
– Build pathway: Find common targets, filter by CellProcess
– Find DE groups/pathways (Gene Set Enrichment analysis, GSEA)
©2006significance
Ariadne Genomics.of
All Rights
Reserved.
Result: List of groups/pathways with p-values indicating statistical
differential
expression. Save as a group, as analysis results or export to Excel
©2006 Ariadne Genomics. All Rights Reserved.
89
Most common workflow for microarray
analysis in Pathway Studio for disease
• Identify genes differentially expressed in
disease (DE genes)
• Identify genes known to associate to disease
according to the literature using Pathway Studio
• Identify DE genes that are linked to known
diseases genes using Pathway Studio
• Report novel disease genes
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
90
Expression Data Import wizard
• Generic tab-delimited format
– Import any matrix expression data containing
expression values and/or p-values. Minimum
requirement: one column with gene identifiers and
one column with sample
• Import of Affymetrix CEL (RMA averaging)
• Import of Molecular devices Genepix format
with Vera & Sam normalization
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
91
Expression experiment viewer in
Pathway Studio
•
•
•
•
•
•
•
Experiment properties
Gene identifier column: views, sorting, find
Heat map scale
Filter genes by value
Filtering by genes by pathway
Text view for expression matrix
Create group from selection
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
92
Finding differentially expressed genes in Pathway Studio (significance analysis):
Two-sample t-test = Between groups t-test
Finds genes that are differentially
expressed between two classes of
samples measured independently on
single color microarrays. Examples:
multiple replicas of one untreated (1)
and multiple replicas of one treated
sample (2); multiple replicas of one
normal sample (1) and multiple
replicas of one disease sample (2);
Calculated p-values indicate
significance of expression difference
between replicas marked 1 and
replicas marked
2.Genomics. All Rights Reserved.
©2006 Ariadne
©2006 Ariadne Genomics. All Rights Reserved.
93
Finding differentially expressed genes in Pathway Studio (significance analysis):
Paired samples t-test, usually for two channel microarray platform
Find genes which are differentially
expressed between two classes of
samples when the comparison is
performed in one experiment (two color
or two channel microarray) but multiple
times.The first class is marked by
positive integer and the corresponding
sample from the second class
measured on the same array is marked
by the negative integer with the same
absolute value. Calculated p-values
indicate significance
of expression
©2006 Ariadne Genomics. All Rights Reserved.
difference between two sample
classes. ©2006 Ariadne Genomics. All Rights Reserved.
94
Finding differentially expressed genes in Pathway Studio (significance analysis):
DE genes in multiple experimental log ratio samples
If you have imported pre-calculated
your data as log ratios of the
normalized expression values you
should use this test to find differentially
expressed genes for multiple replicas
of normalized expression values.
Calculated p-values indicate how far
the ratio of a given gene deviates from
the global mean of ratios across all
genes and samples.
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
95
Gene expression clustering using Relevance network
Expression -> Build network from expression -> Pearson correlation
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
96
Parameters for Pearson correlation
Major parameters:
• Percent of genes to remove – removes less variable genes. Controls number of vertices in the
graph. Keep number of proteins under 1000 in the network
• Threshold – allows correlation links above threshold. Controls number of edges in the graph.
• Number of permutations – turn on automatic Threshold calculation using randomized
expression samples.
• P-value – select most non-random correlation links. Controls number of edges in the graph.
©2006 Ariadne Genomics. All
Value 0.01 corresponds to 10% of all possible links equal to (number
of vertices)2 Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
97
Finding upstream regulator for a gene cluster using Build pathway option Find common
regulators
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
98
Finding major transcription regulators among
differentially expressed genes
Use Build pathway tool option
Find direct interactions with
filtering for PromoterBinding and
Expression to reduce the
complexity of your differential
expression pattern
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
99
Build pathway filter stringencies
Gene Expression:
• Promoter Binding > Expression > Regulation > Cooccurrence
• Protein > Complex > Functional Class
Metabolomics:
• MolSynthesis > Regulation
Proteomics:
• Direct Regulation > ProtModification > Binding >
Regulation
• Protein > Complex > Functional Class
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
100
Questioner Day 1
• What is the quickest Entity search in Pathway
Studio?
• What is the most comprehensive Entity
search in Pathway Studio?
• How to create a group in PathwayStudio and
add entities to it?
• How to Build pathway from the up-regulated
genes in you microarray experiment?
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
101
Workflow 1
Build pathway for EDG regulation
• Using GeneMapp pathway as a guide build the
EDG1 pathway in PathwayStudio:
– Find proteins for EDG1 pathway
– Find relations for EDG1 pathway
– Create additional relations missing from ResNet
database
– Arrange nodes by cell localization
– Save pathway as HTML for web publication
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
102
Workflow 2
Create a pathway containing groups and subpathways as nodes.
• Continue building EDG pathway by
adding sub-pathways and groups
• Complet the pathway by text-mining
search with filtering
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
103
Workflow 3
Find drug regulating kinases
• Find kinases in the database with connectivity >0
– Search by attribute for Functinal class = Kinase and
Connectivity >0
• Find drugs regulating these kinases
–
–
–
–
Expand pathway from kinases with filter by small molecules
Select drugs in the expanded pathway
Select neighbors for drugs
Copy selection in the new pathway
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
104
Workflow 4
Find biological processes regulated by proteins
involved in prostate cancer
•
•
•
•
Find prostate cancer disease node
Find proteins regulating prostate cancer
Find cell processes affected by these proteins
Sort found processes by number of prostate
cancer protein regulators
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
105
Day 2
Advanced workflows in Pathway Studio
©2006 Ariadne Genomics. All Rights Reserved.
Workflow 1. Comparative Gene ontology analysis
(Folberg’s experiment)
Import of CEL files
1) Calculation of the differentially expressed genes
2) Creating a group from DE genes
3) Finding statistically significant GO groups
4) Creating a pathway from GO groups
5) Comparing two lists of GO groups
6) Finding DE genes in GO groups
Comparing lists of the differentially
expressed GO groups rather than
DE genes is more sensitive when
comparing the responses in two cell
lines, patients and other samples.
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
107
Comparing lists of the differentially expressed GO groups rather than DE genes is more
sensitive when comparing the responses in two cell lines, patients and other samples
Subtracting genes
No significant groups
6 genes
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
108
Two groups of genes differentially expressed during growth in 3D culture vs. flat
culture for aggressive and non-aggressive tumors are selected
non-aggressive
flat
flat
3D
3D
no growth
growth
1. Genes of interest
2. Groups of interest
aggressive
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
109
Comparative GO group analysis of aggressive vs. non-aggressive uveal melanoma
Open DE GO groups from aggressive tumors
©2006 Ariadne Genomics. All Rights Reserved.
Compare with DE GO groups
from non-aggressive tumors
©2006 Ariadne Genomics. All Rights Reserved.
110
Select GO groups related to your experimental goals
(cell adhesion DE groups unique for aggressive tumors)
These groups are significant in aggressive melanoma when
we compare its growth in 3D matrix vs. flat culture
©2006 Ariadne Genomics. All Rights Reserved.
These groups are NOT significant in non-aggressive
melanoma when we compare its
growth in 3D matrix vs. flat
©2006 Ariadne Genomics. All Rights Reserved.
111
culture
A network of differentially expressed in aggressive uveal melanoma involved in
cell adhesion
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
112
23 SP1 targets among DE genes in cell adhesion network unique for aggressive uveal melanoma
during 3D growth
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
113
Supportive evidence for SP1 role in melanoma aggressiveness
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
114
Workflow 2. Three methods to find biological
processes affected by DE genes
1) Find groups from Biological processes Gene Ontology classification
2) Find pathways indicating biological processes
3) Build pathway option Find common targets filtering for Cell Process
Includes:
- Finding proteins using Search by attribute (cell localization) and then
determining their biological processes
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
115
Workflow 3. Three ways to find biomarkers in
Pathway Studio
• By text-mining
– Extract pathways from text: PubMed Search for your Disease
• By data-mining
– Search for disease of interest in the database
– Use Build Pathway: Expand option to find Disease biomarkers
• By gene expression data analysis
– Identify Differentially expressed genes
– Use Build pathway: Direct interaction option to find proteins that are
downstream of many DE genes. These proteins are most likely
biomarkers according to your expression data (See also next slide)
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
116
Workflow 4. Building disease network using Build
pathway tool
Includes:
• Finding disease of interest in the database
• Finding proteins contributing to disease
• Finding biomarkers for a disease
• Building disease networks using:
–
–
–
–
Build pathway Find direct interactions for protein regulating
disease
Build pathway Expand pathway for protein biomarkers
Combining two pathways
©2006 Ariadne Genomics. All Rights Reserved.
Layout by cell localization
- Text –mining : updates?
©2006 Ariadne Genomics. All Rights Reserved.
117
Workflow 5. Building pathway by text-mining for LiFraumeni syndrome
Includes:
• Creating new local database
• Use of Search PubMed option (Db import)
• Consolidation of the db (db updates / groups)
• Understanding the major protein players in LiFraumeni syndrome
• Understanding regulators / targets / cell
processes associated with Li-Fraumeni
syndrome
©2006 Ariadne Genomics. All Rights Reserved.
©2006 Ariadne Genomics. All Rights Reserved.
118