Information Encoding in Biological Molecules: DNA and

Download Report

Transcript Information Encoding in Biological Molecules: DNA and

The UCSC (University of California
Santa Cruz) Genome Browser
“the golden path”
genome.ucsc.edu
Stephen Baird
Apoptosis Research Centre
Children’s Hospital of Eastern Ontario
[email protected]
Lecture/Lab 7.2
1
Jim Kent
• First assembly of the human
genome as a graduate student
with his program GigAssembler
•
Catalog of software includes:
• blat - Fast alignment of similar sequences.
• autoSql - create SQL and C code for permanently storing a structure in
database and loading it back into memory based on a specification file
• ameme - Find motifs in DNA sequence.
• 40 other command line programs for genome browser
• The Intronerator - to look at C. elegans genes and splicing patterns.
• cis-Site Seeker - Look for regulatory regions in RNA or DNA
sequences
Lecture/Lab 7.2
2
UCSC Genome Gateway Structure
Custom tracks
Genome
browser
Table browser
Your
sequence
BLAT
Database
in silico PCR
Downloadable data
files
Lecture/Lab 7.2
Gene sorter
Proteome
browser
Public MySQL
server
Your query
3
The UCSC Home page: genome.ucsc.edu
navigate
navigate
Lecture/Lab 7.2
4
UCSC Genome Browser Gateway
- start page, basic search
Lecture/Lab 7.2
5
Lecture/Lab 7.2
6
}
Genome viewer
section
Overview of the whole
Genome Browser page
Groups of data
Mapping and Sequencing Tracks
Genes and Gene Prediction Tracks
mRNA and EST Tracks
Expression and Regulation
Comparative Genomics
Variation and Repeats
ENCODE Regions and Genes
ENCODE Transcript Levels
ENCODE Chromatin Immunoprecipitation,
Chromosome, Chromatin and DNA Structure,
Variation
Lecture/Lab 7.2
7
Lecture/Lab 7.2
8
Configure Tracks – Spliced ESTs,
Microarray Expression, Repeats, etc
Lecture/Lab 7.2
9
Known
Genes
Spliced ESTs
By UCSC
Simple Repeats
Lecture/Lab 7.2
10
Gene Description
Links to Tools/DBs
UniProt Description
Links to output
Sequence
“Known Gene”
Details page for
Clock gene
Microarray data
mRNA secondary structure
Protein domains/structure
Homologs
Gene Ontology ™ (GO)
mRNA descriptions
Lecture/Lab 7.2
pathways
11
Proteome Browser
Genome
Browser
Superfamily
Domin Db
Lecture/Lab 7.2
12
Genome Gateway Help/User’s Guide
Lecture/Lab 7.2
13
BLAT – Blast Like Alignment Tool
Lecture/Lab 7.2
14
In Silico PCR
Lecture/Lab 7.2
15
“Gene Sorter” and “Table Browser”
• Query database by filtering and cross
references all of the data tables of the database
to output sequence, genomic positions or text
data.
• What are in all the tables?
– genome.ucsc.edu/goldenPath/gbdDescriptions.html
Lecture/Lab 7.2
16
Gene Sorter
• “display a sorted table of genes that are
related to one another”
• EXAMPLE 1: Make a list of genes of
membrane proteins that are highly expressed
in pancreatic islet cells to possibly explore the
role of autoimmunity in Type 1 Diabetes.
Lecture/Lab 7.2
17
Gene Sorter - Configure
Lecture/Lab 7.2
18
Gene Sorter - Filter
Lecture/Lab 7.2
19
Gene Sorter - Output
Sequence- genomic,
protein or mRNA
Lecture/Lab 7.2
Text – Tab
delimited
20
Gene Sorter - To Try Now
• EXAMPLE 2: Find genes expressed
predominately in the mouse adrenal gland
that have human ‘homologs’. Get the
sequence data and examine the expression
of the human orthologs.
• Enter any gene to start.
• In configure menu: (a) Expand tissue selection of GNF Atlas 2 to
“median of replicas”, (b) click on human homologs
• In filter menu: (a) set adrenal gland minimum box to 2.5, (b) look
at results and set maximum box of other commonly expressed
tissues to 0.5
• Complete solution in notes 7.2 UCSC.
Lecture/Lab 7.2
21
Table Browser
Groups as
in Browser
Tracks within
Group
Filter fields in Table and connecting Tables
Intersect non-connecting Tables by position
RESET!
Lecture/Lab 7.2
22
Table Browser – table schema
Lecture/Lab 7.2
23
Table Browser – Example
• EXAMPLE 3: Find CpG islands in known
genes on the last part of chromosome 22 of
the human genome. Obtain the genes
sequences as one fasta record per region.
Change to
Lecture/Lab 7.2
24
Table Browser – CpG Example
Click on
‘intersection’
Lecture/Lab 7.2
Set group for ‘Expression
and Regulation’ and track
for ‘CpG’ Islands
25
Table Browser – CpG Example
Lecture/Lab 7.2
26
Table Browser – CpG Example
Copy and paste
sequences or
Set up an ‘output file’ in
the Table Browser
Lecture/Lab 7.2
27
Table Browser – Example To Try
• EXAMPLE 4: Find trinucleotide repeats of
more than 10 copies within mRNA sequence
on human chromosome 4. How many are
there? How many are linked to known
disease genes?
• Hints
• Period = 3, copies > 10.
• Intersect tables and custom track.
• Tables: knownGene, simpleRepeats, spDisease
Lecture/Lab 7.2
28
VisiGene
-in situ mRNA and protein images in mice and frogs
Lecture/Lab 7.2
29
Data Downloads
...
- from download link on homepage
Lecture/Lab 7.2
30
Example: simpleRepeats table
Lecture/Lab 7.2
31
Public MySQL Server
See the Data and Downloads FAQ:
Direct MySQL access to data
http://genome.ucsc.edu/FAQ/FAQdownloads#download29
Command from local MySQL client:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A
Lecture/Lab 7.2
32