Tutorial_7 (2014)

Download Report

Transcript Tutorial_7 (2014)

Tutorial 5
Motif discovery
Agenda
Motif discovery
• MEME
Creates motif PSSM de-novo (unknown motif)
• MAST
Searches for a PSSM in a DB
• TOMTOM
Searches for a PSSM in motif DBs
Cool story of the day:
How NOT to be a bioinformatician
Motif – definition
Motif
a widespread pattern with a biological significance.
Sequence motif
PTB (RNA binding protein)
UCUU
CAP (DNA binding protein)
TGTGAXXXXXXTCACAXT
Sequence motif – definition
Motif
a nucleotide or amino-acid sequence pattern that is widespread
and has a biological significance
PSSM - position-specific scoring matrix
..YDEEGGDAEE..
..YDEEGGDAEE..
..YGEEGADYED..
..YDEEGADYEE..
..YNDEGDDYEE..
..YHDEGAADEE..
1
2
3
4
5
6
7
8
9
10
A
0
0
0
0
0
3/6
1/6
2/6
0
0
D
0
3/6
2/6
0
0
1/6
5/6
1/6
0
1/6
E
0
0
4/6
1
0
0
0
0
1
5/6
G
0
1/6
0
0
1
1/3
0
0
0
0
H
0
1/6
0
0
0
0
0
0
0
0
N
0
1/6
0
0
0
0
0
0
0
0
Y
1
0
0
0
0
0
3/6
3/6
0
0
Can we find motifs using multiple
sequence alignment (MSA)?
YES!
NO
Local multiple sequence alignment is a
hard problem to solve
Motif search: from de-novo motifs to
motif annotation
gapped motifs
Large DNA data
http://meme.sdsc.edu/
MEME
MEME – Multiple EM* for Motif finding
http://meme.sdsc.edu/
• Motif discovery from unaligned sequences - genomic or
protein sequences
• Flexible model of motif presence (Motif can be absent in
some sequences or appear several times in one sequence)
*Expectation-maximization
MEME - Input
Input file
(fasta file)
How many
times in each
sequence?
Range of
motif
lengths
How many
motifs?
How
many
sites?
MEME - Output
Motif evalue
MEME – Sequence logo
Motif evalue
Motif length
Number of
appearnces
A graphical representation of the sequence motif
MEME – Sequence logo
High information content = High confidence
The relative sizes of the letters indicates their frequency in the
sequences
The total height of the letters depicts the information content
of the position, in bits of information.
MEME – Sequence logo
Multilevel Consensus
Patterns can be presented as regular
expressions
[AG]-x-V-x(2)-{YW}
[] - Either residue
x - Any residue
x(2) - Any residue in the next 2 positions
{} - Any residue except these
Examples: AYVACM, GGVGAA
MEME – motif alignment
Sequence
names
Position in
sequence
Strength of
match
Motif within
sequence
Sequence
names
MEME – motif locations
Motif location in
the input
sequence
Overall strength of
motif matches
What can we do with motifs?
• MAST - Search for them in
non annotated sequence
databases (protein and
DNA).
• TOMTOM - Find the protein
which binds the DNA
motifs.
MAST
MAST
http://meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi
• Searches for motifs (one or more) in sequence
databases:
– Like BLAST but motifs for input
– Similar to iterations of PSI-BLAST
• Profile defines strength of match
– Multiple motif matches per sequence
• MEME uses MAST to summarize results:
– Each MEME result is accompanied by the MAST result for
searching the discovered motifs on the given sequences.
MAST - Input
Database
Input file
(motifs)
If you wish to use motifs
discovered by MEME
Input
motifs
MAST - Output
Presence of the motifs in a given database
MAST – Output
(another example, global view)
MAST – Output
(another example, global view)
TOMTOM
TOMTOM
http://meme.sdsc.edu/meme/doc/tomtom.html
• Searches one or more query DNA motifs
against one or more databases of target
motifs, and reports for each query a list of
target motifs, ranked by p-value.
• The output contains results for each query, in
the order that the queries appear in the input
file.
TOMTOM - Input
Input
motif
Background
frequencies
Database
TOMTOM - Output
Input
motif
Matching
motifs
TOMTOM – Output
Wrong input (RNA sequence of RNA binding protein NOVA1)
“OK” results
MAST vs. TOMTOM
MAST
Comparison Profile against DB
DB
General DBs
TOMTOM
Profile against
Profile
Known motif DBs
Cool Story of the day
How NOT to be a bioinformatician