Public data and tool repositories Section 2 Survey of
Download
Report
Transcript Public data and tool repositories Section 2 Survey of
Public data and tool repositories
Section 2
Genome Browsers
Problems from last section
1. Query Entrez Gene with the following two queries separately
and then explain the differences between the two results using a
logical NOT operation:
a) tyrosine kinase[Gene Ontology] AND human[Organism]
b) cd00192[Domain Name] AND human[Organism]
2. Retrieve the APP gene record from NCBI and use the Display
dropdown menu to display Conserved Domain Links. Use the
ids of the listed domains to query Entrez Gene for records with
the same domains.
3. Use the SNP Geneview link at NCBI to identify coding SNPs in
the APP gene. Which SNP is missing from this display which
was present in the Ensembl APP protein record?
4. Use the Homologene link at NCBI to identify possible functional
orthologs for human APP. How does this list compare to the
Ensembl list of orthologs that we reviewed previously?
Review of last section
example: human APP gene
1. NCBI Entrez databases
a) Constructing queries
b) Gene, Nucleotide and Protein
c) RefSeq
2. EBI/Ensembl
a) Finding genes
b) Viewing Genes, Transcripts, Exons, Proteins
and SNPs
3. Common id and data formats
This section
1. Genome assembly and
genome browsers
2. Promoter/enhancer analysis
example
3. More information
Genome Build Process
1. Organism sequence data is assembled
into contiguous pieces (contigs)
2. Contigs are mapped to genomic features
and the coordinate system is assigned
3. Unmapped sequence data be assigned to
artificial chromosomes
4. Assembly is improved as more sequence
data is available
Entrez Genome Project
Genome Browsers
1. Make millions of sequences available
through easily accessible, user-friendly
interfaces
2. Provide genomic sequence, exon structure,
mRNA sequence, EST and SNP data via
web-based text search interfaces
3. Options available for local installs
Commonly Used Browsers
1. The Entrez Map Viewer
2. The EBI/Ensembl browser
3. The UCSC genome browser
NCBI Map Viewer
1. Integrates feature identity information with
whole genome view
2. Allows one to view and search an
organism's complete genome
3. Displays chromosome maps
4. User can zoom into progressively greater
levels of detail, down to the sequence data
for a region of interest.
5. Focus more on individual sequences
Ex: Looking at the APP gene in the NCBI Map Viewer
EBI/Ensembl Browser
1. Provides access to sequence data from ~40
organisms
2. Includes the human genome sequence and data
from all the commonly used experimental
organisms
3. Displays the location of genes, variations and
other sequence features within genomes
4. Greatest strengths:
a) browsing of large genomic contigs
b) comparative genomic features
Ex: Looking at the APP gene in the EBI/Ensembl Browser
UCSC Genome Browser
Strength is genome position-based data aggregation:
1. Data positioned on “best” genome build and
organised into “tracks”
2. Outside data tracks
1.
2.
3.
4.
5.
3.
Inside data tracks
1.
2.
4.
Genome builds
Genes, known and predicted
mRNA
Expression and regulation
Variations and repeats
Known Genes
Comparative genomics
Custom tracks
Ex: Looking at the APP gene in the UCSC Genome Browser
APP Upstream Region
15kb
Ex: Extracting and aligning human and mouse APP upstream regions
Promoter/enhancer analysis
approaches
1.
Same gene, multiple species
a)
b)
c)
Assumed evolutionary conservation of non-coding regions
Can use pairwise or multiple alignment method
Examples:
i.
ii.
2.
Precomputed: UCSC conservation tracks
Dynamic: eg, rVista
Different genes, same species
a)
b)
c)
d)
Typical output as co-expressed clusters from microarray data
Looking for over-represented, small binding sites
Much better results if looking for a pattern or clustering of
multiple sites
Motif-finding algorithm, eg, MEME
Tutorials
1. NCBI
•
•
•
Field Guide
Information and tutorials
Science Primer
2. EBI
•
2Can Tutorials
3. UCSC
•
Genome Browser User’s Guide
4. Bulk Downloads
•
Bulk Downloads Tutorial
IN CLASS EXERCISE
1. Do all three browsers show the same
number of transcript variants for: APP,
EGFR, TP53?
2. How many SNPs appear in the 5’ UTR of
APP?
3. What is the lowest conservation score in
APP exon 2?