Gene regulatory networks

Download Report

Transcript Gene regulatory networks

Assessment of sequence
alignment
Lecture 10
1
Introduction
• The Dot plot Matrix visualisation matching
tool:
– Basics of Dot plot
– Examples of Dot plot matching 2 sequences
– Tandems repeats self matching
– Inverted repeats: genetic palindromes
2
Sequence alignment Analysis
• In order to measure the degree of similarity
between sequences they must first be aligned
to maximise the matching score:
•
•
•
•
•
Example 1
I am from Cork
I am not from Cork
****
(4 matches out of 18; based on
•
•
•
•
Example 2
I am ---- from Cork
I am not from Cork
**** **********
•
(14 matches out of 18; based on length
of bottom string)
length of bottom string)
3
The Dot plot
• A better way of doing this is to represent each
sequence as a table or matrix, where one
sequence represents the rows and the other the
columns. The Dot plot Matrix is a visual way of
seeing the alignment between two sequences:
– The first sequence (query sequence) represents the
rows and the other sequence (subject sequence)
represents the columns.
– All elements (row/column) are checked for a match
and if there the cell is marked.
– This will show all areas of both sequences where
matches occur.
4
Dot plot
• Consider the following:
– Diagnol lines represent a alignments
(match)
– Horizontal lines between aligned
sequences indicate gaps are required
(where the gaps indicate a
deletion/insertion)
–
• This has four “potential” aligned
sequences:
–
–
–
–
D->Y;
H->N
R->0
0->H
• Longest sequence of alignments are:
– D->Y; and H->Y and
adapted from Lesk 2008.
• Do you think, assuming this
represented DNA/AA sequences that
“gaps” should be used to join these
sequences?
5
Dot plot Matrix
• This allows us to visualise areas of local
alignment as opposed to global alignment.
• One of the main purpose to find domains /
motifs that match . This could be useful for
many reasons; e.g. promoter factor binding
site….
• Can you think of any others (refer to previous
lecture’s)?
6
Dot plot: the previous examples
• Klug 7ed p. 403; There is sample DNA sequences which we
will match via:
• the blast program on the NCBI website:
• Go to the exploring genomics part of the books website and
cut and paste the sequences into the query and subject
“windows”
• You must ensure that you set the search low. [discussed in
next lecture]
• Run the blast program [ a tool that aligns and measures the
alignment (discussed in next lecture)]
7
8
Refer to saved web page
Dot plot: example 1
9
Dot plot: example 1
10
Dot plot: Example 2
11
Sequence Matching: Example 2
12
Dot plot for Tandem Repeats
• The human genome has many tandem repeats
small sequences of nucleic acids (bases)/ Amino
acids that are repeated and are ubiquitous in
genomes and can compromise 50% of genome.
(Richard 2008)
• They can be used as genealogical markers
• To determine specific regions of interest; e.g.
introns
• Play a significant part in evolution Gemayel 2010
• An example of a protein with multiple repeats is
human mucin (Baxevanis 2005 p. 297)
13
Dot plot of tandem repeats
14
Tandem repeat as a sequence
Tandem repeat 1
A
B
R
A
C
A
D
A
B
R
A
C
A
D
A
B
R
A
A
B
R
A
C
A
D
A
B
R
A
C
A
D
A
B
R
A
Tandem repeat 2
A
B
R
A
C
A
D
A
B
R
A
C
A
D
A
B
R
A
A
B
R
A
C
A
D
A
B
R
A
C
A
D
A
B
R
A
15
Tandem repeat dot plot
• To determine if there is tandem repeats the
sequence is compared with itself (refer table 1)
• The more diagonals the more repeats
• The diagonals at the bottom left compare the
start with the finish
• The fact the main diagonal means the both
sequences are the same .
• The lines are symmetrical around the main
diagonal:
16
Genetic Palindromes
•
A palindrome is a word that is spelt the same from right to left as well as from left
to write: This will give an “X” shaped dot-plot. (try; eye, navan; never odd or even
…..)
•
Remember left to right is (5’ to 3’) on primary strand and right to left is (5’ to 3’)
on the complimentary strand. Alternatively it means a match between a strand and
its reverse compliment.
•
2 possible types of “Genetic Palindromes” [the difference being that the left to
right, read, is on one strand while the right to left, read, is on its complimentary
strand]:
– Restrictive enzymes such as EcoR1:
•
•
5’ GAATTC 3’
3’ CTTAAG 5’
– Inverted repeats
•
On different segments; each repeat read the same (GTGAG) but in opposite directions. An example is
promoter region for the CAP protein in the lac operon :
–
•
5‘ GTGAGnnnCTCAC 3'
3' CACTCnnnGAGTG 5’
What will the dot plot for the above 2 sequences look like. (
17
Using dot plot and BLAST
• Refer to fig 9.19 understanding bioinformatics
which shows how the DOT plot and a BLAST
can be used to find longer alignments.
18
Exam Question
• Describe how to construct a dot plot matrix
for the alignment of DNA/AA sequences and
explain how they can be used to check for the
presence of different types repeating
sequences.
19
References
• Baxevanis A.D. 2005 Bioinformatics: a practical guide to the analysis
of genes and proteins chapter 11; Wiley
• Klug, W. S. (2010); the essentials of genetics; 7th ed Pearson
Education
• Gemayel, R. et al 2010 Variable tandem repeats accelerate
evolution of coding and regulatory sequences. Annu Rev genet 44:
445-477
• Richard, G.F. (2008) Comparative genomics and molecular dynamics
of DNA repeats in eukaryotes. Microbiol Mol biol rev 2008
Dec;72(4):686-727
• More general DOT PLOT information: introduction to dotplot
• Inverted repeats and dotplot.
20