UCSC Genome Browser Tools - CS273a

Download Report

Transcript UCSC Genome Browser Tools - CS273a

UCSC Genome Browser Tutorial
http://genome.ucsc.edu/
http://genome-test.cse.ucsc.edu/
The UCSC Toolset & Portal
to the Human Genome
• Genome Browser
• Table Browser
“I was blind and now I can see”
http://cs273a.stanford.edu
1
UCSC Genome Browser
[version9a]
http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml
http://cs273a.stanford.edu
2
Genome Browser helps visualize genome annotation
•Simple genome sequence of limited use without functional
annotation.
GCTCTGAGATCTCCCTCCGGCTCCTTGGCCCGGGACTTTCTGCGCCCTGA
Exon
•The genome browser is a tool for visualizing genome
annotation.
http://cs273a.stanford.edu
3
The UCSC Homepage: http://genome.ucsc.edu
navigate
navigate
General information
Specific information—
new features, current status, etc.
4
The Genome Browser Gateway
start page choices, December 2006
1

1.
2.
3.
2
3
Make your Gateway choices:
Select Clade
Select species: search 1 species at a time
Assembly: the official backbone DNA sequence
practically speaking, there is no such thing as a genome.
there is only a genome assembly. assemblies update.
frequently. think moving target...
5
Everything in Genomics is a Moving Target




The genomes
Their annotations
The Portals
Our understanding of Biology
Conclusion:
write code
that can be
run...
and rerun
and rerun
and rerun
and rerun
6
The Genome Browser Gateway
start page, basic search
7
The Genome Browser Gateway
start page choices, December 2006
4
5
6

1.
2.
3.
4.
5.
6.
Make your Gateway choices:
Select Clade
Select species: search 1 species at a time
Assembly: the official backbone DNA sequence
Position: location in the genome to examine
Image width: how many pixels in display window; 5000 max
Configure: make fonts bigger + other choices
8
The Genome Browser Gateway
start page, basic search
4
text/ID
searches

Use this Gateway to search by:





Gene names, symbols
Chromosome number: chr7, or region: chr11:1038475-1075482
Keywords: kinase, receptor
IDs: NP, NM, OMIM, and more…
See lower part of page for help with format
9
The Genome Browser Gateway
sample search for Human TP53

Sample search: human, March 2006 assembly, tp53
select


Select from results list
ID search may go right to a viewer page, if unique
10
Overview of the whole
Genome Browser page
(mature release)
}
Genome viewer section
Groups of data
Mapping and Sequencing Tracks
Genes and Gene Prediction Tracks
mRNA and EST Tracks
Expression and Regulation
Comparative Genomics
Variation and Repeats
ENCODE Tracks
11
Different species, different tracks, same software


Species may have different data tracks
Layout, software, functions the same
12
Sample Genome Viewer image, TP53 region
base position
STS markers
Known genes
RefSeq genes
GenBank seqs
17 species compared
single species compared
SNPs
repeats
13
Visual Cues on the Genome Browser
Tick marks; a single location (STS, SNP)
3' UTR
exon
<<<
exon
< exon < < < <ex 5' UTR
Intron, and direction of transcription <<< or >>>
Track colors may have meaning—for example, Known Gene track:
•If there is a corresponding PDB entry, = black
•If there is a corresponding NCBI Reviewed seq, = dark blue
•If there is a corresponding NCBI Provisional seq, = light blue
For some tracks, the height of a bar is increased likelihood
of an evolutionary relationship (conservation track)
14
Options for Changing Images: Upper Section
Walk
left or
right
click to
zoom 3x
and re-center



Zoom
in
Specify
a
position
Zoom
out
fonts,
window,
more
Change your view or location with controls at the top
Use “base” to get right down to the nucleotides
Configure: to change font, window size, more…
15
Annotation Track display options
enforce
changes
Links to info
and/or filters
Change
track view

Some data is ON or OFF by default


Menu links to info about the tracks: content, methods
You change the view with pulldown menus

After making changes, REFRESH to enforce the change
16
Annotation Track options, defined

Hide: removes a track from view

Dense: all items collapsed into a single line

Squish: each item = separate line, but 50% height + packed

Pack: each item separate, but efficiently stacked (full height)

Full: each item on separate line
17
Reset, Hide, Configure or Refresh to change settings
enforce any changes
(hide, full, squish…)
reset, back
to defaults



start from
scratch
You control the views
Use pulldown menus
Configure options page
18
Annotation Track options, if altered….
important point: the browser remembers!




Session information (the position you were examining)
Track choices (squish, pack, full, etc)
Filter parameters (if you changed the colors of any items, or the
subset to be displayed)
…are all saved on your computer. When you come back in a
couple of days to use it again, these will still be set. You may—
or may not—intend this.
To clear your “cart” or parameters, click default tracks
OR
19
Saved Sessions
20
Click Any Viewer Object for Details
Click the item
New
web page
opens
Example: click your
mouse anywhere
on the TP53 line
Many details
and links
to more data
about TP53
21
Click annotation track item
for details pages
informative
description
other resource links
Not all genes have
links to sequences This much detail.
microarray data
Different
annotation tracks
carry different data.
mRNA secondary structure
protein domains/structure
homologs in other species
Gene Ontology™ descriptions
mRNA descriptions
pathways
22
Get DNA, with Extended Case/Color Options



Use the DNA link at
the top
Plain or Extended
options
Change colors,
fonts, etc.
23
Get Sequence from Details Pages
Click a track, go to Sequence section of details page
Click the line
Click the item
sequence section
on detail page
24
Accessing the BLAT tool
BLAT = BLAST-like Alignment Tool



Rapid searches by INDEXING the entire genome
Works best with high similarity matches
See documentation and publication for details

Kent, WJ. Genome Res. 2002. 12:656
25
BLAT tool overview:
www.openhelix.com/sampleseqs.html
Make
choices


Paste one
or more
sequences
DNA limit 25000 bases
Protein limit 10000 aa
25 total sequences
submit
Or
upload

26


sorting
Results with demo sequences, settings default; sort = Query, Score


go to alignment detail
go to browser/viewer
BLAT results, with links
Score is a count of matches—higher number, better match
Click browser to go to Genome Browser image location (next slide)
Click details to see the alignment to genomic sequence (2nd slide)
27
BLAT results, browser link
click to flip frame
query




From browser click in BLAT results
A new line with your Sequence from BLAT Search appears!
Watch out for reading frame! Click - - - > to flip frame
Base position = full and zoomed in enough to see
amino acids
28
BLAT results,
alignment details
Your query
Genomic match, color cues
Side-by-side alignment
yours
genomic
29
Understand Blat’s Limitation




Blat was designed to rapidly align sequence from one
genome back to itself (e.g., EST/cDNA data)
It can and it does miss clear hits at times
Blat actually allows for a single mismatch, but it also
removes k-mers with excessive counts for efficiency.
Not suitable for cross-species mapping.
30
Bunch More Goodies – Click Around
31
Bibliography:

http://genome.ucsc.edu/goldenPath/pubs.html






The UCSC Genome Browser Database:
update 2008, update 2007, and earlier.
UCSC Genome Browser Tutorial
UCSC Genome Browser: Deep support for
molecular biomedical research
The UCSC Known Genes, 2006.
The UCSC Gene Sorter, 2007.
Piloting the Zebrafish Genome Browser, 2006.
32
UCSC Genome Browser
[version9a]
33
Genome Browser Database
visualize
search & download
Underlying
Database
(MySQL)
Primary table:
positions, names, etc.
Auxiliary table:
related data
34
The Table Browser
Open browser
Open browser
http://genome.ucsc.edu/
35
Table Browser: Choose Genome
Choose Genome
In the Human genome (hg16),
search for simple repeats
on a chromosome 4 location
with copy number more than 10
and download the sequence.
36
Table Browser: Choose Table to Search
Choose Data Table
In the Human genome (hg16),
search for simple repeats
on a chromosome 4 location
with copy number more than 10
and download the sequence.
37
Table Browser: Describe Table
Describe table
38
Table Browser: Choose Region to Search
Choose Region to Search
In the Human genome (hg16),
search for simple repeats
on a chromosome 4 location
with copy number more than 10
and download the sequence.
39
Table Browser: Upload Locations to Search
Paste
Upload
40
Table Browser: Filter to Refine Search
Create Filter
Submit Filter
In the Human genome (hg16),
search for simple repeats
on a chromosome 4 location
with copy number more than 10
and download the sequence.
41
Table Browser: Output Data
Output data
In the Human genome (hg16),
search for simple repeats
on a chromosome 4 location
with copy number more than 10
and download the sequence.
42
Table Browser: Output Formats
Text Fields
Output formats
43
Table Browser: Fasta Sequence Output
Sequence
44
Table Browser: Database Format Outputs
Database
45
Table Browser: Custom Track Output
Custom Track
46
Table Browser: Hyperlinks Output
Hyperlinks
47
Table Browser: Obtaining Output
Adding name creates file on desktop,
leaving blank creates output in browser.
(exception: custom track)
Data Summary
48
Table Browser: Output configuration
Sequence Format
Get Sequence
49
Table Browser: Intersecting Data
2nd Table
Any Overlap
Intersect
Submit
Find simple repeats
(copy number > 10)
within known genes
and download the sequence.
50
Table Browser: Intersecting Data Narrows Search
Filtered simple repeats
Summary
Filtered simple repeats,
intersected (overlapping)
w/ known genes
51
Table Browser: Downloading Sequence Data
Sequence Format
Get Sequence
52
Table Browser: Correlating Data Tables
Get Results
Correlate 2 Datasets
53
Custom Tracks: Table Browser Searches
Create Track
54
Custom Tracks: Name and Configure Track
Name Track:
SRepeatKGenes
Describe Track:
Intersection …
Choose default
view in browser
Download track
file to desktop
55
Custom Tracks: Open Track in Genome Browser
Open Details
Compare
“…caused by an
expanded, unstable
trinucleotide repeat…”
56
Custom Tracks: Track in Table Browser
Custom tracks also are available
for filtering and intersections
on the Table Browser
57
Custom Tracks: User-generated Data in Track
Custom Track How-to
Custom Tracks Link
58
Custom Tracks: Four Steps to Create Track
Four
steps to create a custom track
Define
track characteristics
Define browser characteristics
Format your data
Upload and view your track
59
Custom Tracks: Submit Track
Submit File
Copy and paste
small or simple tracks
http://genome.ucsc.edu/FAQ/FAQformat
60
Custom Tracks: Track Appears in Genome Browser
61
Custom Tracks: Track Characteristics
Default view of
custom track is “pack”
Default view
of other tracks set
62
Custom Tracks: Track Appears in Table Browser
Custom Track also
appears in
Table Browser
63
Custom Tracks from Outside Sources
Contributed Track
Custom Tracks Link
64
Bibliography:

http://genome.ucsc.edu/goldenPath/pubs.html




The UCSC Table Browser, 2004.
Bejerano et al., Nature Methods, 2005.
The UCSC Proteome Browser
Phylogenomic Resources at the UCSC Genome
Browser
65