Welcome to the Broad Institute
Download
Report
Transcript Welcome to the Broad Institute
Basic features for portal users
Agenda - Basic features
• Overview
– features and navigation
• Browsing data
– Files and Samples
• Gene Summary pages
• Performing Analyses on the portal
– Co-expression, differential expression, GSEA
• Managing your shelf
Overview - portal home page
http://www.humanimmunology.org/cchi
Overview - organization
The portal data is organized around 4 main concepts
• Laboratories (aka projects)
• Studies (aka experiments)
• Data Sets
• Files
Access control is organized around
• Users
• Groups
http://www.humanimmunology.org/cchi
Overview - Labs and Studies
• Laboratories
– Have defined ‘curator’ groups and ‘reader’ groups
– Contain zero or more studies
• Studies
– Represent a collection of data assembled to answer a
question
– Contain zero or more datasets
– ‘Reader’ groups are a subset of their Lab’s reader groups
Overview - Datasets and Files
• Datasets
– All data is of one type (gene expression, CN, etc)
– Multiple datasets of the same type is OK
– contain zero or more files
– ‘reader’ groups are a subset of their Study’s reader groups
• Files
– The basic unit of data in the portal
– May be any format
unrecognized formats may not be analyzed but may be shared and
downloadable
Overview - Laboratories/projects
http://www.humanimmunology.org/cchi
Browsing data
• Sharing Data
– You can see (and download) any data files you can see
– Filter data types with the checkboxes on top
• Page Info
– At the top of most pages - brief help for the page
• My Shelf
– Save datasets to your shelf for later (re)use
Browsing data
Browsing Samples
• Interactive browser of sample annotations
• Filter samples based on phenotypic information provided
• Thumb-scrollers for
numeric data
Exercise 1. Browse the portal
1. Go to the portal in a web browser
http://www.humanimmunology.org/cchi
2. Login/register if needed
3. Click on the ‘BROWSE’ menu item
Then the ‘DATA’ submenu
4. Uncheck the ‘Sample annotation’ and ‘undefined’ filter
checkboxes
5. Click on the ‘BROWSE’ menu item again
then the ‘SAMPLES’ submenu
6. Select a dataset to browse
7. Experiment with filtering options
Gene Summary Pages
•
•
•
•
Provide an overview of the information about a gene
Heatmaps showing expression in the datasets that you can see
Gene description (from Entrez), links to COSMIC
Optional
– Display summaries of mutations
- if any are loaded in the portal
– Display plot of copy number by expression
- requires paired CN and expression samples & linking ids
Gene Search
• Enter a gene name in the search box on the home page or near
the menus
• Multiple hits indicates multiple species (we’ll make this more
explicit in a later version)
click
Gene Summary Pages
Exercise 2. Review your favorite gene
1. Enter a gene name in the search box
e.g. EGFR, FGFR3
2. Click a gene name on the results page
3. Review the gene summary page
Performing Analyses
• The portal is built to allow non-computational biologists to
perform many common analyses
– Look for co-expressed genes
– Look for differentially expressed genes
– Look for gene set enrichment
• Analyses are performed by a GenePattern server using its
modules
Co-expression -> Gene Neighbors
Diff. Expression -> Comparative Marker Selection
Gene Set enrichment -> GSEA
Performing Analyses - details
• Analysis parameter defaults are set by the portal curator
– These are set portal-wide
• To change the parameters and/or assumptions, download the
data and analyze it in GenePattern directly
• Detailed descriptions of the analyses, how to run them, and
default parameters are available on the help menu
– Text tutorials for all
– Video tutorials for some
Performing Analyses - help
Co-expression
• Find genes with similar gene expression profiles to a particular gene
• You provide a gene and select a dataset
• An analysis is launched to detect the 20 most correlated genes in the
dataset using Pearson Correlation
• The analysis displays a heat map
– This is a java applet, you must tell your browser to ‘allow’ it when asked
or you will not see it
– The heat map viewer can be ‘popped’ out of the browser to allow you
to see more detail
– Menus (on the viewer) provide numerous other options to explore
Co-expression
Co-expression results
Exercise 3. Find co-expressed genes
1. Go to the portal home page
2. Select the ‘Analyses’ menu
3. Select the GeneNeighbors button, click ‘Next step’
4. Enter a gene name (e.g. EGFR), click ‘Select Gene Symbol’
5. Click the gene name (if needed), click ‘Select Data Set’
6. Select ‘YFV_2008…’, click ‘Select Probe’
7. Click ‘Run Analysis’
Differential expression
• This looks for genes whose expression levels vary between 2
conditions
• Select a dataset, then define 2 classes based on the sample
annotations
• An analysis is launched to detect the 20 top ranked genes in
each direction using 2-sided SNR (median) and 1000
permutations
• The analysis displays a heat map and a table with the genes
and their significance
– This heatmap is just an image, not an applet
Differential expression
Differential expression results
Exercise 4. differentially expressed genes
1.
Go to the portal home page
2.
Select the ‘Analyses’ menu
3.
Select the Comparative Marker Selection button, click ‘Next step’
4.
Create a Sample Set, Select ‘YFV_2008…’, click ‘Create Sample Set’
5.
For Class 1, Click ‘Tcell activation’ and the range 0.49-1.6
6.
For Class 2, Click ‘Tcell activation’ and the range 9-12.1
7.
Enter a name and description,
8.
Click ‘Run Analysis’
9.
Open results from ‘My Shelf’ when complete
Gene Set Enrichment Analysis
• Sometimes no individual genes are significantly
differentially expressed
• We improve statistical power by comparing gene
sets
• Example: human diabetes
– No single gene significant
– GSEA was used to assess enrichment of 149
gene sets including 113 pathways from internal
curation and GenMAPP, and 36 tightly coexpressed clusters from a compendium of
mouse gene expression data.
These GSEA results appeared in Mootha et al.
Nature Genetics 15 June 2003, vol. 34 no. 3 pp
267 – 273:
Skeletal muscle
biopsies
Normal Diabetic
• Rank genes according to their “correlation”
with the class of interest.
• Test if a gene set (e.g., a GO category, a
pathway, a different class signature),
“enriches” any of the classes.
• Use Kolmogorov-Smirnoff score to
measure enrichment.
Phenotype
Ordered
Marker
List
Subramanian et al., PNAS 2005
Gene
Set G
Enrichment Score S
Enrichment: KS-score
Max.
Enrichment
Score ES
Gene List Order Index
hit (member of G)
miss (non-member of G)
Mootha et al., Nature Genetics 2004
Enrichment: KS-score
Max.
Enrichment
Score ES
Un-enriched Gene Set
Enrichment Score S
Enrichment Score S
Enriched Gene Set
Gene List Order Index
Max.
Enrichment
Score ES
Gene List Order Index
Every hit go up by 1/NH
Every miss go down by 1/NM
The maximum height provides the enrichment score
Performing GSEA
• Like differential expression, select a dataset and define classes
• GSEA uses the c2 curated gene sets representing metabolic and
signaling pathways (http://www.broadinstitute.org/gsea/msigdb)
GSEA Results
Exercise 5. GSEA
1.
Go to the portal home page
2.
Select the ‘Analyses’ menu
3.
Select the GSEA button, click ‘Next step’
4.
Create a Sample Set, Select ‘YFV_2008…’, click ‘Create Sample Set’
5.
For Class 1, Click ‘neutralizing antibody titer’ and the range
482-1280
6.
For Class 2, Click ‘neutralizing antibody titer’ and the range 20-280
7.
Enter a name and description,
8.
Click ‘Run Analysis’
9.
Open results from ‘My Shelf’ when complete
Managing ‘My Shelf’
http://www.humanimmunology.org/cchi
Exercise 6. Review your shelf
1. Click on the ‘My Shelf’ button at the top right
2. Click on the ‘Analyses’ tab
-Review the analyses you did earlier
- revisit the results
3. Click on the ‘Sample Sets’ tab
Review the Sample Sets you created for CMS, GSEA
4. Click on the ‘Profile’ tab
Review your email and group memberships
Review of Basic Features
•Overview
–features and navigation
•Browsing data
–Files and Samples
•Gene Summary pages
•Performing Analyses on the portal
–Co-expression, differential expression, GSEA
•Managing your shelf