Transcript Document

InterPro/prosite
UCSC Genome
Browser
Exercise 3
Turning information into knowledge
 The
outcome of a sequencing project is
masses of raw data
 The challenge is to turn this raw data into
biological knowledge
 A valuable tool for this challenge is an
automated diagnostic pipe through which
newly determined sequences can be
streamlined
From sequence to function

Nature tends to innovate rather than invent
 Proteins are composed of functional
elements: domains and motifs



Domains are structural units that carry out a
certain function
The same domains are
shared between different
proteins
Motifs are shorter
sequences with certain
biological activity
http://www.ebi.ac.uk/interpro/
InterPro

An integrated documentation resource for
protein families, domains and sites
 Groups signatures describing the same
protein family or domain
 Combines a number of databases that use
different methodologies to derive protein
signature:



UniProt: UniProtKB Swiss-Prot, TrEMBL,
UniRef,UniParc
prosite: documented DB on domains, families and
functional sites.
Pfam: a DB of protein families represented by
MSAs
Member databases
 Sequence-motif


methods:
Protein signature DBs with different
focus
Sequence-cluster methods:

Hierarchically clustered
sequence/structure DBs
InterPro search
http://www.expasy.ch/prosite/
prosite
 A method
for determining the function of
uncharacterized translated protein
sequences
 Consists of a DB of annotated biologically
important
sites/patterns/motifs/signature/fingerprints
prosite
 Entries
are represented with patterns or
profiles
profile
pattern [AC]-A-[GC]-T-[TC]-[GC]
1
2
3
4
5
A
0.66
1
0
0
.
T
0
0
0
1
.
C
0.33
0
0.66
0
.
G
0
0
0.33
0
.
Profiles are used in prosite when the motif is relatively
divergent, and it is difficult to represent as a pattern
Scanning prosite
Query:
sequence
Result: all patterns
found in sequence
Query:
pattern
Result: all sequences
which adhere to this
pattern
Patterns with a high probability of
occurrence

Entries describing commonly found posttranslational modifications or compositionally
biased regions.
 Found in the majority of known protein
sequences
 High probability of occurrence
prosite sequence query
prosite pattern query
UCSC Genome Browser
UCSC Genome Browser Gateway
Reset all
settings of
previous user
UCSC Genome Browser Gateway
UCSC Genome Browser Gateway
UCSC Genome Browser
query results
UCSC Genome Browser
Annotation tracks
Base position
UCSC Genes
UTR
RefSeq
mRNA (GenBank)
Vertebrate
conservation
Single species
compared
SNPs
Repeats
Intron
Exon
Gene
Direction
USCS Gene
UCSC Genome Browser - movement
Zoom x3 +
Center
UCSC Genome Browser – Base view
Annotation track options
dense
squish
pack
full
Annotation track options
Another option to
toggle between
‘pack’ and ‘dense’
view is to click on
the track title
Sickle-cell
anemia distr.
Malaria
distr.
BLAT

BLAT = Blast-Like Alignment Tool
 BLAT is designed to find similarity of >95% on
DNA, >80% for protein
 Rapid search by indexing entire genome.
Good for:
1. Finding genomic coordinates of cDNA
2. Determining exons/introns
3. Finding human (or chimp, dog, cow…)
homologs of another vertebrate sequence
BLAT on UCSC Genome Browser
BLAT on UCSC Genome Browser
BLAT Results
BLAT Results
Match
Non-Match
(mismatch/indel)
Indel
boundaries
BLAT Results
BLAT Results on the browser
Getting DNA sequence of region
Getting DNA sequence of region