w0506_class5

Download Report

Transcript w0506_class5

Introduction to Bioinformatics Tutorial no. 5
MEME – Discovering motifs in sequences
MAST – Searching for motifs in databanks
TRANSFAC – The Transcription Factor DB
WebLogo - Input
Aligned
Sequences
http://weblogo.berkeley.edu
(e.g. output of
ClulatlW)
RUN !
WebLogo - Output
Genes:
Proteins:
MEME


http://meme.sdsc.edu/
Motif discovery from unaligned sequences


Identifies profile motifs


Genomic or protein sequences
Multiple motifs for any input
Flexible model of motif presence


Motif can be absent in some sequences
Can appear several times in one sequence
MEME Input
Email address
Multiple input sequences
Range of
motif lengths
How many
motifs?
How many times in
each sequence?
How many
sites?
MEME Output (1)
Motif
length
Like
BLAST
Number
of times
“Position-Specific
Probability Matrix”
= Motif Profile
Most
popular
symbols
Diversion of
motif position
from background
MEME Output (2)
Position in
sequence
Strength of
match
Sequence
names
Reverse complement
(genomic input only)
Motif within
sequence
MEME Output (3)
Motif
instance
Overall strength
of motif matches
Original
sequence lengths
MAST

Searches for motifs (one or more) in
sequence databases:



Profile defines strength of match



Like BLAST but motifs for input
Similar to iterations of PSI-BLAST
Multiple motif matches per sequence
Combined E value for all motifs
MEME uses MAST to summarize results:

Each MEME result is accompanied by the MAST
result for searching the discovered motifs on the
given sequences.
MAST Input
Email address
Consider matched
sequence length
Motif file (e.g. MEME output)
Database (like
BLAST)
E value
threshold
MAST Output (1)
Link to
GenBank
Matched
accession
Match
E value
Length of
sequence
MAST Output (2)
Motif
diagram
MAST Output (3)
Position of
each instance
Motif and
orientation
Matched
parts of
sequence
P value of
instance
Motif ‘consensus’
TRANSFAC
Database of eukaryotic DNA transcription regulation:
 Individual regulatory sites (SITES table)



Proteins which bind sites (FACTORS table)






Genes to which they belong
Proteins which bind them
Cellular source of protein
Nucleotide motif profile for binding
Some grouping and classification
Classification of factors (CLASS table)
Position-specific matrices for select factors
(MATRIX table)
Cell localization (CELL table)
Searching TRANSFAC


www.gene-regulation.com
Search a single table




By identifier, factor name, gene name
By species, author
Browse your way from table to table
Search within a sequence

MatInspector, TFScan (EMBOSS package)
TRANSFAC Factor
DT
FA
GE
SF
CP
CN
EX
FF
IN
MX
BS
DR
Date; author
Factor name
Encoding gene
Structural features
Cell specificity (positive)
Cell specificity (negative)
Expression pattern
Functional features
Interacting factors
Matrix
Binding SITE
External databases
References:
RN
Reference no.
RX
MEDLINE ID
RA
Reference authors
RT
Reference title
RL
Reference data
TRANSFAC Matrix
Accession
Position
Specific
Matrix
Concensus
(IUPAC
subset
symbols)
Statistical
basis
TRANSFAC Site (1)
DNA
or
RNA
Accession
number
Gene
Sequence of
regulatory
element
Gene
region
Position range
of factor
binding site
TRANSFAC Site (2)
Binding
factor
accession
Organism
Cellular
source
External
links
Factor
name
Binding ‘quality’
1
functionally confirmed
2
binding of pure protein
3
immunologically
characterized extract
4
via known binding
sequence
5
extract protein binding to
bona fide element
6
unassigned
Methods of
identifying site
TRANSFAC Factor (1)
FA: Factor
name
AC: Accession
number
HO: Homologs
SX: Other
names
OS: Organism
OC: Taxonomy
SZ: Size
CL: Classification
SX: Amino
acid sequence
TRANSFAC Factor (2)
Protein
sequence
reference
Features and
positions
Cell specificity
Structural
features
Question
A biologist at your university has found 15 target genes that
she thinks are co-regulated. She gives you 15 upstream
regions of length 50 base pairs in FASTA format, file
DNASample50.txt, and asks you to identify the motif, and - if
possible - the potential regulating protein. She tells you the
Homo sapiens, and by intuition feels sequences are from
the motifs of length 8. She wants you to suggest only the
.best possible candidate motif
Question
After you ran all the programs your biologist friend
confesses that she is not sure if her intuition
about the motif length was correct. Re-run the
tool without knowledge of motif length. Do you get
the same results?
Determine a potential DNA binding
protein using TRANSFAC