GX 10 - MedBlog
Download
Report
Transcript GX 10 - MedBlog
GeneSpring GX 10 for Gene Expression Analysis
• Agilent Bioinformatics & GeneSpring overview
• GX 10 Guided & Advanced Data Analysis
• Practice & Discussion
February 2009
Antoni Wandycz
Elise Chang
Agilent Technologies
GeneSpring
January 2009
Agilent Bioinformatics Suite
Transcriptome
‘GX 10’
DNA
‘DNA Analytics’
miRNA, QPCR, Exon
ChIP, Methyl, CGH
RNA
DNA
GeneSpring Workgroup
Data storage & Computation
CH2OH
Protein
Share
& Collaborate
Proteome
‘GX 11’
Metabolome
‘GX 11’
GeneSpring
January 2009
History and Future of GeneSpring
Agilent acquires
Silicon Genetics
2004
2005
Agilent acquires
Stratagene
2006
2007
2008
GX 9
GX 7.3.1 Released
GX 9
development on
avadis platform
2009
GX 11
GX 10
• GX 7.3 functions
• miRNA, Exon,
QPCR analysis
• Pathway
analysis
• Support for
eArray
GeneSpring
January 2009
GeneSpring GX: Multiple-Platform Compatibility
•Agilent Feature Extraction files (>FE v8.5)
•Affymetrix CEL, CHP
•llumina BeadStudio (>v 3.1)
•ABI SDS, RQ Manager (for QPCR)
•Custom Formats (ALL 1 & 2-color microarrays)
•.GPR files from AXON Scanners (GenePix software)
GeneSpring
January 2009
GeneSpring GX 10 – Key features
• Guided Workflows
• New Applications - miRNA, QPCR, Exon & more in future
• Project-based organization & Translation-on-the-fly
• Biological Context - Pathway Analysis, GSEA, GO, IPA, etc.
• Customization - Scripting in Jython and R
GeneSpring
January 2009
GX 10 Key features: Guided Workflows
Pre-determined steps:
• Normalization
• QC
• Statistics
• GO
• Pathways
GeneSpring
January 2009
Project-based organization
GeneSpring
January 2009
GeneSpring GX 10: Translation (Chap 3 in GX 10 manual)
•Comparing Platforms
i.e. Affymetrix vs. Agilent vs. Spotted
•Comparing Species
i.e. Mouse vs. Human -- Homology Table
(NCBI’s Homologene)
•Comparing Applications:
i.e. Gene Expression & QPCR or miRNA
GeneSpring
January 2009
GX 10 Key features: Translation
Compare platforms, applications, species
Homology table
displayed
GeneSpring
January 2009
Venn Diagram
Compare experiments from different platforms,
applications, & species
GeneSpring
January 2009
GX 10 Key features: Biological Context
GO Analysis (Fx, Process, Location)
GSEA (Gene Set Enrichment Analysis)
Pathway Analysis
GeneSpring
January 2009
GeneSpring GX 10: Gene Ontology (GO) Analysis
Likelihood that your genes of interest
fell into a GO category, just by chance
HELP
always
available
GeneSpring
January 2009
Pathway Analysis in GX 10:
Two types of Pathway Analysis in GX 10:
1. ‘Pathway Analysis’ Tool
Building networks of related entities
8 Pathway Interaction Databases and NLP
2. ‘Find Significant Pathways’ Tool
Entity-list enrichment with known pathways
(Step 8 in Guided Workflow)
BioPax format pathways (.owl)
GeneSpring
January, 2009
Overlay Networks with Expression Data/Conditions
GeneSpring
January 2009
Cellular Location Overlay of Network
GeneSpring
January 2009
‘Find Similar Pathways’ Tool
Significant enrichment of my genes in particular pathways?
Analysis performed on all
pathways imported into GX 10
Significant pathways are added to
experiment
GeneSpring
January 2009
e-Seminars & Workshops www.genespring.com
Recorded Seminars:
1. Introduction to GX 10
2. Analysis of miRNA & GE data
3. Analysis of QPCR & GE data
4. Alternative Splicing
5. Pathway Analysis
GeneSpring
January 2009
Getting Started in GeneSpring GX 10
Advanced Workflow: (To Find Differentially Expressed Genes)
Affymetrix Files
GeneSpring
January, 2009
Cardiogenomics dataset: Affymetirx data
Congestive heart failure (CHF)
is a degenerative condition in which the heart
no longer functions effectively as a pump.
The most common cause of CHF is
damage to the heart muscle by not enough
oxygen. This is usually due to narrowing of
the coronary arteries which take blood to the
heart.
Idiopathic cardiomyopathy results in
weakened hearts due to an unknown cause.
Ischemic cardiomyopathy is caused
by a lack of oxygen to the heart due to
coronary artery disease.
GeneSpring
January 2009
Cardiogenomics dataset: Affymetirx data
Experimental Goal:
To identify the molecular mechanisms underlying congestive
heart failure, gene expression profiles were compared between
male and female patients with idiopathic, ischemic or nonfailing heart conditions.
Male
Female
Non-failing 2 samples
2 samples
Idiopathic
2 samples
2 samples
Ischaemic
2 samples
2 samples
CEL files generated by
Affymetrix GCOS
GeneSpring
January 2009
Experimental Setup in GeneSpring
Gender Interpretation
SAMPLE GENDER
1
2
3
4
5
6
7
8
9
10
11
12
Female
Female
Male
Male
Female
Female
Male
Male
Female
Female
Male
Male
CHF
ETIOLOGY
Idiopathic
Idiopathic
Idiopathic
Idiopathic
Ischemic
Ischemic
Ischemic
Ischemic
Non-failing
Non-failing
Non-failing
Non-failing
Condition 1: Female (Samples 1, 2, 5, 6, 9,
10)
Condition 2: Male
(Samples 3, 4, 7, 8, 11, 12 )
CHF Etiology Interpretation
Condition 1: Idiopathic (Samples 1, 2, 3, 4)
Condition 2: Ischemic (Samples 5, 6, 7, 8)
Condition 3: Non-failing (Samples 9, 10, 11, 12)
Gender/CHF Etiology Interpretation
Condition 1: Female/Idiopathic (Samples 1, 2)
Condition 2: Male/Idiopathic
(Samples 3, 4)
Condition 3: Female/Ischemic (Samples 5, 6)
Condition 4: Male/Ischemic
(Samples 7, 8)
Condition 5: Female/Non-failing (Samples 9, 10)
Condition 6: Male/Non-failing
(Samples 11, 12)
The selected Interpretation determines how the samples are displayed in the
various views and the comparisons that are made in analyses such as statistics.
GeneSpring
January 2009
GeneSpring GX 10 Vocabulary
Project – collection of experiments
Entity – gene, probe, probeset, exon, etc.
Interpretation – samples that are grouped together based on
conditions.
Technology – A file containing information on array design
and biological information (annotation) for all the entities on
the array
Biological Genome – a collection of all major annotations
(NCBI) for any organism; essential for Generic/Custom arrays
lacking annotations
GeneSpring
January 2009
Getting Started in GeneSpring
Cardiogenomics Experiment:
Transcriptional profiling to learn more about molecular
mechanisms underlying Congestive Heart Failure (CHF)
Sample Data:
Myocardial samples from patients with normal hearts and
Ischemic & Idiopathic cardiomyopathies (3 Etiologies)
Variables:
Gender (2) and Etiology (3)
Technology:
Affymetrix U133Plus2 array
GeneSpring
January, 2009
Getting Started: Create New Project
From Startup screen OR from File/New Project
GeneSpring
January 2009
Getting Started with Advanced Analysis
Experiment Type: Affymetrix Expression (3 Affy choices!)
Workflow Type: Advanced Analysis
GeneSpring
January 2009
Select Data for Experiment
Select ‘Choose Files’ to load data files found on your computer.
Note:
‘Choose Samples’ option is
used when creating
experiments with samples
already loaded into GX 10
GeneSpring
January 2009
Sample Upload
GeneSpring
January 2009
Summarization Algorithms in GX 10 for CEL Files
Summarization of
Affymetrix probes
and baseline
transformation of
probeset values.
GeneSpring
January 2009
Summarization algorithms in GX 10
In addition to different calculations, the algorithms differ in the order in
which Normalization and Summarization are performed.
BACKGROUND
SUBTRACTION
NORMALIZATION
PROBE
SUMMARIZATION
RMA
PM based
Quantile
Log (PM)
MAS5
PM-MM based
Scaling
PLIER
PM-MM based
Quantile
One-step Tukey
Biweight
Log (PM)
LiWong
PM-MM based
Quantile
Linear (PM)
GCRMA
PM-MM based
Quantile
Log (PM)
GeneSpring
January 2009
Preprocessing of Affymetrix Arrays
CEL files are the raw data files
that contain signal values for
individual probes.
Array
Hybridization
& Scanning
CEL files are preprocessed to
generate one value per
probeset.
Preprocessing steps are:
1. Background subtraction
2. Normalization
3. Summarization of probeset
values
Different preprocessing
algorithms are available.
DAT File
GCOS
AGCC
CHP
Image
Analysis
CEL File
+
CDF File
GeneSpring
January 2009
BoxWhisker plot: Summary of Normalized Intensities
GeneSpring
January 2009
Advanced Workflow Experiment Setup
Experiment Grouping
Specify parameters/conditions
GeneSpring
January 2009
Experiment Grouping
The experimental
parameters are
added in this
window.
For each array, the
particular parameter
value (condition) is
also specified.
Values can be
added manually or
loaded from a saved
file (circled in Red).
GeneSpring
January 2009
Advanced Workflow Experiment Setup
Create Interpretation
In the Guided Workflow, only
one interpretation is
automatically provided.
Here, users can create
multiple interpretations
GeneSpring
January 2009
Grouping and Interpretation
2 experimental variables: CHF Etiology and Gender
For this experiment, 3 interpretations could be created:
1) Gender
2) CHF Etiology (Ischemic, Idiopathic, non-failing)
3) CHF Etiology and Gender: This interpretation is
automatically created in the Guided Workflow.
Example: Gender Only
GeneSpring
January 2009
Creating Interpretations: step 2 of 3
GeneSpring
January, 2009
Creating Interpretations: step 3 of 3
GeneSpring
January, 2009
Advanced Analysis Workflow: Quality Control
QC on Samples and Probes
automatically performed in
Guided Workflow
Users can specify settings
beyond those available in
Guided Workflow
GeneSpring
January 2009
Quality Control on Samples
GeneSpring
January, 2009
Filter by Expression
GeneSpring
January, 2009
Advanced Analysis Workflow: Analysis
Statistical Analysis
Filter on Volcano Plot
(both Stats and Fold Change)
Fold Change
Clustering
Find Similar Entities
Filter on Parameters
PCA
GeneSpring
January 2009
Getting Started with Guided Workflow
Experiment Type: Agilent Single-color
Workflow Type: Guided Workflow
GeneSpring
January 2009
Sample Upload
GeneSpring
January 2009
BoxWhisker plot: Summary of Normalized Intensities
GeneSpring
January 2009
GeneSpring GX 10: Important Menu options:
Project:
Import/Export project zip
Tools:
Script Editor/ R Editor
Import BioPAX pathways
GS7 data migration
Options…
Annotations:
Update Technology Annotations
Create Biological Genome
Update Pathway Interactions
Help:
License Manager
Update Product
GeneSpring
January 2009
Pathway Analysis
To use ‘Find Significant Pathways’ Tool:
1. Download BioPax format (.owl) pathways
www.biopax.org to your computer
2. Import .owl pathways into GX 10 from Tools and
‘Import BioPax pathways’ option
3. From Workflows menu (in the right margin of GX 10)
select ‘Find Similar Pathways’ and choose your
Entity List of interest
GeneSpring
January, 2009
Performing Pathway Analysis in GX 10:
1. In the Annotations Menu, select ‘Update Pathway
Interactions’ from Agilent Server
2. Before choosing an organism, GX 10 must first create a
Pathway Database Infrastructure. May take >10 min
3. Once the Infrastructure database is complete, go back to
Annotations/Update Pathway Interactions and choose
your preferred organism. May take >20 minutes
4. From Workflows menu (in the right margin of GX 10) select
‘Pathway Analysis’ to begin building networks
GeneSpring
January, 2009
Updating Annotations:
Chap 3 in GX 10 pdf manual, pg. 51
Option 1: Update from Agilent Server
Option 2: Update from file
Option 3 is new in GX10:
Update directly from NCBI
from GX (Biological Genome)
GeneSpring
January 2009
GeneSpring GX 10: Reference pages in Manual
Creating/Updating Technologies & Annotations:
Chapter 3 in GX 10 pdf manual, pg. 51
From 1) Agilent server; 2) file; 3) NCBI (Biological Genome)
GS7 to GS10 Data Migration:
Chapter 4 in GX 10 manual, pg. 71 and in Quick Start Guide
Translation:
Chapter 3.3 in pdf manual (pg 63)
GeneSpring
January 2009
Thank you
www.genespring.com
Technical Support 24 hours/5 days per week
[email protected]
1-800-227-9770 (option 6, 2)
[email protected]
[email protected]
[email protected] (Genomics)
GeneSpring
January, 2009
Automated GX 7 Migration Tool Chapter 4 in GX 10 manual
Step1: Prepare for GS7 Migrationtool automatically prepares data
for migration (for large # of
samples, this step takes time)
Step3: Open Project with name
corresponding to GX 7 genome to see the
migrated data. Note that if genome was
assigned a project in GX 7, this name will
be the name of the project in GX 10
instead of the name of GX 7 genome
Step2: Select GS7 genome to migrate to
GS10- all experiments, samples,
interpretation, gene lists, trees, parameter
values, condition values, and classifications
will be automatically migrated
GeneSpring
January 2009
GX 10: Biological Context
GO Analysis (Fx, Process, Location)
GSEA (Gene Set Enrichment Analysis)
GSA (Gene Set Analysis)
Pathway Analysis (Interaction DB)
Find Similar Entity Lists
Find Significant Pathways
(BioPax.org)
Link to Ingenuity’s IPA
NLP (mine literature)
GeneSpring
January 2009
GSEA
GSEA interrogates genome-wide expression profiles from
samples belonging to two different classes (e.g. normal
and tumor) and determines whether genes in an a priori
defined gene set correlate with class distinction
Reference: Subramanian et al. Gene set enrichment
analysis: A knowledge-based approach for
interpreting genome-wide expression profiles.
PNAS. September 30, 2005, 10.1073
GeneSpring
January 2009
GSEA Method
1. Rank genes based on the correlation between their
expression intensities and class distinction
•
Genes that differ most in their expression between the two classes will
appear at the top and bottom of the list
•
Assumption is that genes related to the phenotypic distinction of the
classes will tend to be found at the top and bottom of the list
2. Calculate enrichment score (ES) to reflect the degree of
overrepresentation of genes in a particular gene set at the
top and bottom of the entire ranked list
3. Derive p-value for the ES to estimate its significance level
4. Adjust p-value for multiple testing
GeneSpring
January 2009
Gene Set Enrichment Analyses
GeneSpring
January 2009
Gene Set Enrichment Analyses
How is performing GSEA or GSA on GO gene sets different
from doing GO Analysis on a list of differentially expressed
genes?
• Statistical analysis can miss genes with small changes relative to noise
that, as a group, can have significant impact on the observed difference in
phenotype
– Use All Entities list as input for GSEA or GSA
• Instead of looking at only at individual differentially expressed genes, take
a genome-wide approach to see if gene sets are associated with the
phenotypic class distinction
– Enrichment in GO Analysis done with Fisher’s Exact while GSEA/GSA is
done with a type of running sum statistics
• User can specify any Entity List as gene sets in GeneSpring GX
GeneSpring
January 2009
Identifiers Necessary for GSEA
Technology must contain Gene Symbol
Columns that must be marked in custom technology to perform
GSEA:
• Annotation file must contain a column (Column X) containing Gene Symbol
– Column X must be marked “Gene Symbol”
– Select “Gene Symbol” mark from the drop-down menu while creating
Custom technology.
GeneSpring
January 2009
Gene Sets
GSEA/GSA can use either Broad lists or any Entity Lists in GeneSpring
Broad Institute has defined four categories of gene sets:
• C1- Grouped based on cytogenic location.
• C2- Functional lists. ~1000 gene lists corresponding to pathways or functional
process (if they are both involved in inflammatory response, they can also be in the
same list)
• C3- Regulation lists. Grouped according by promoter analysis. Genes are
regulated by the same motif (may or may not know transcription factor). Cases
where they simply share same binding motif and therefore assumed to be coregulated.
• C4- Proximity to known oncogene and tumor suppresors. For example, all the
neighbors of BRCA.
• C5- GO gene sets. Each category is represented as a gene set except for very
broad categories such as Biological Process and categories with less than 10
genes
GeneSpring
January 2009
Key Differences Between GSEA and GSA
The two algorithms share the same idea, but differ in the way
they determine what gene sets are significantly enriched
• Differs in the GSA "maxmean" statistic: this is the mean of the positive or
negative part of gene scores in the gene set, whichever is larger in
absolute value. Efron and Tibshirani shows that the method used in GSA
is often more powerful than the modified Kolmogorov-Smirnov statistic
used in GSEA.
• GSA uses a somewhat different null distribution for estimation of false
discovery rates: it does "restandardization" of the genes, in addition of the
permutation of samples (done in GSEA)
• GSA also can handle more than two conditions (limitation in GSEA)
GeneSpring
January 2009