bchm628_lect2_14
Download
Report
Transcript bchm628_lect2_14
Working with gene lists:
Finding data using GEO
& BioMart
June 5, 2014
Analyzing a gene list
With hundreds of genes but a limited budget and
lab personnel, you need to prioritize the gene list to
candidate genes for follow-up
Pick ones that are “interesting”
Known to
be involved in other related processes but
not (yet) in your process of interest
Has
protein features which suggest a function in your
process, but it has not been characterized
No
known function or domain, but it shows up in
other, related high-throughput experiments
suggesting a key role in your process of interest
Our approach
Analyzing gene lists by:
1. Finding overlap with other high-throughput
experiments
2. Finding additional information using BioMart
1.
Mouse/human homologs
2.
Protein domain content
3.
GO classification
GEO (gene expression omnibus)
GEO Datasets
Curated gene expression datasets
i.e. there is backlog of experiments that haven’t made it
into the database
Can
search for experiments and conduct differential
gene expression queries on some datasets
Can download
datasets & do offline analyses
GEO Profiles
Profiles of
expression data for genes
Why search GEO?
What other experiments have been done that are
similar to yours?
GEO
datasets
How do my genes of interest behave in other large
scale experiments
GEO
profiles
GEO Profile search
Search on a gene
name (C04F5.7):
GEO Dataset search
“C. elegans”: 4434
GEO Dataset searches
Query
Total
datasets
C. elegans
datasets
C. elegans
4434
4072
C. elegans AND response
131
121
C. elegans AND host response
5
5
C. elegans AND immune
24
20
C. elegans AND antimicrobial
109
94
Once dataset identified
Download data
SOFT format:
tab-delimited data
Issues:
Not
necessarily processed such that they have the
ratios of experiment/control
If starting with raw
data, may not be able to replicate
exactly what authors did or lack expertise/software
to generate a list of DE genes
Look for supplementary data from publication
Usually they provide a
list of all DE genes
Choice of dataset for comparison
In class demo
Biomart – EBI Ensembl
Use series of menus
Data source – organism (genes, variation, ect)
Filters -- reduce the number of results
Attributes – what data to return
Can
set up very precise and multilayered queries
Can
query across multiple organisms
Simple query:
Given a
list of gene IDs, you can obtain attributes or
sequences for the entire list
Tools
ID converter
– very useful, easy to use
Two sites for BioMart access
www.biomart.org
Database journal issue on BioMart
Filtering in BioMart
Attributes in BioMart
Biomart
Filters
C.
elegans genes with a human homolog
Specify only genes with >=
# isoforms
protein coding genes with a
transmembrane domain
Attributes
Entrez Gene IDs, WormBase IDs, Affy IDs
Sequence data
transcript, protein, UTRs, flanking regions, ect.
BioMart
In class demo
Today’s exercise
Compare current dataset from PLoS Pathogens
paper to data from a different dataset
Identify & retrieve additional information about C.
elegans genes using BioMart