Genome Browsers

Download Report

Transcript Genome Browsers

Genome Browsers
UCSC (Santa Cruz, California) and
Ensembl (EBI, UK)
Eukaryotic Genomes:
Not only collections of genes
• Protein coding genes
• RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA)
• Structural DNA (centromeres, telomeres)
• Regulation-related sequences (promoters, enhancers, silencers,
• Parasite sequences (transposons)
• Pseudogenes (non-functional gene-like sequences)
• Simple sequence repeats
Eukaryotic Genomes:
High fraction non-coding DNA
Bron: Mattick, NRG, 2004
Blue: Prokaryotes
Black: Unicellular eukaryotes
Other colors: Multicellular eukaryotes (red = vertebrates)
Human Genome
• 3 billion basepairs (3Gb)
• 22 chromosome pairs + X en Y chromosomes
• Chromosome length varies from ~50Mb to
• About 22000 protein-coding genes
– compare with ~14000 for fruitfly en ~19000 for
Nematode C. elegans
Human genome
Bron: Molecular Biology of the Cell (4th edition) (Alberts et al., 2002)
Only 1.2% codes for proteins, 3.5-5% is under selection
Long introns, short exons
Large spaces between genes
More than half exists of repetitive DNA
Variation Along Genome sequence
• Nucleotide usage varies
along chromosomes
– Protein coding regions tend to
have high GC levels
• Genes are not equally
distributed across the
– Housekeeping generally in
gene-dense areas
– Gene-poor areas tend to have
many tissue specific genes
Bron: Ensembl
Chromosome organisation
Bron: Lodish (4th edition)
DNA packed in chromatin
Active genes in less dense chromatin (beads-on-a-string)
Non-active genes often in densely packed chromatine (30-nm fiber)
Gene regulation by changing chromatin density, methylation/acetylation of the
Limited availability of chromatin information in genome browsers (post
transcriptional modifications are currently under investigation with ChIP-onchip experiments
Genome browsers
Genome Browsing
With the UCSC Genome Browser
UCSC Genome browser
Choose a species, an assembly and a gene
Gene search results
Genome browser
Genomic Datatypes (Tracks)
Transcription data rather complicated
Browser → Gene record
Gene record
Gene record (2)
Gene record (3)
Gene record (4)
“best hit”
Gene record (5)
Genomic elements
• Genome browsers can be used to examine
other things
– Genomic sequence conservation
– Pseudogenes
– Duplications en deletions of pieces chromosome
(Copy Number Variations, CNVs)
Genomic Sequence Conservation
• Not only protein coding parts are conserved in evolution
• Conserved non-coding genomic sequences can be
involved in gene regulation (enhancers, silencers,
• With the UCSC browser one can examine genomic
Genomic Conservation (UCSC)
• Pseudogenes “look” like (are homologous to) proteincoding genes, but are non-functional
• Two types:
– Unprocessed pseudogenes (loss of function)
– Processed pseudogenes (mRNAs that are retrotranscribed onto
the genome  they miss introns and sometimes have a polyA)
• The UCSC contains various databases of pseudogenes:
– Yale pseudogenes (both types pseudogenes)
– Vega pseudogenes (both types pseudogenes)
– Retroposed genes (only processed pseudogenes)
Pseudogenes (UCSC)
Copy Number Variation
• People do not only vary at the nucleotide level
(SNPs); short pieces genome can be present in
varying number of copies (Copy Number
Polymorphisms (CNPs) or Copy Number
Variants (CNVs)
• When there are genes in the CNV areas, this
can lead to variations in the number of gene
copies between individuals
• With the UCSC browser CNVs can be examined
Copy Number Variation (UCSC)
Finding a sequence in the genome
BLAT – Search page
BLAT - Results
BLAT – “Details”
BLAT – “Browser”
Genome browsers
Genome Browsing
With the Ensembl Genome
Ensembl Genome browser
Het Human Genome
MapView – Overview chromosome
ContigView – Zooming in (compare UCSD)
ContigView (2)
GeneView – Gene record
TransView - mRNA Transcript
TransView - mRNA Transcript (2)
Alternative Transcripts
Bron: Wikipedia
GeneView - Show Alternative Transcripts
GeneSpliceView - Alternative Transcripts
Single Nucleotide Polymorphisms (SNPs)
• Sequence variations within a species
• Similar to mutations, but are simultaneously
present in the population, and generaly have
little effect
• Are being used as genetic markers (a genetic
disease is e.g. associated with a SNP)
• ENSEMBL offers a nice SNP view
GeneView - Show SNPs
GeneSNPView - SNPs
GeneView - Show Protein
ProtView - Protein
ProtView - Protein Sequence
ProtView – Search proteins with the same
DomainView – Proteins with a certain domain
(Interpro = SMART + PFAM + others)
ProtView - Find Proteins In the Same
Protein Family
FamilyView – Alignments of homologous
Finding Human Genes
Finding a human gene (2)
Blast (2)
UCSC vs Ensembl: Which is better ?
• They more or less contain the same information
• UCSC is a bit easier in use
• Ensembl gives more detailed information and
more flexible data export
• Other small differences in data (e.g. UCSC has
more extensive genomic conservation data)
• Whatever your are familiar with !!