Genomics of sensory systems

Download Report

Transcript Genomics of sensory systems

Lecture #4 : Comparing genes
9/14/09
This week
 Homework
#2 due on Wed
Email with questions
Email me answers or hand in in class
 Wed
- I will be at Dept of Biology retreat
Lecture will be given by Kelly O’Quin - expert
in phylogenetics
He will go over homework so it must be done
before class
Questions for today
0. More BLAST
1. Where do we get high quality gene
sequences?
2. How do genes evolve?
3. How do we compare genes?
How to find genes
 Start
with genes which are known from
model organisms
 Use these to pull out genes from
genomes
 Compare genes to learn about sensory
evolution
Blast - Genbank
 What
database do you want to search?
 What
do you want to compare?
 What
program do you want to do the
searching?
Types of blast queries
Query
Database
Type
Nucleotide
Nucleotide
Blastn, Megablast,
Discont megablast
Protein
Protein
Blastp, Psi-blast, Phiblast
Translated
nucleotide
Protein
Protein
Blastx
Translated
nucleotide
Translated
nucleotide
Tblastn
Translated
nucleotide
Tblastx
Defaults
Database
Program
Confirm
Nucleotide BLAST = DNA nucleotide query vs nucleotide database
Choices for programs
 Megablast
Highly similar sequences >95%
Word length 28
 Discontiguous
megablast
Pretty similar seqs
Word length 11
 Blastn
Dissimilar seqs
Word length 11
Translated blast = protein query vs translated database
BLAST a genome
Request ID
AWJ4D4B7012
BLASTing is fun
 This
is meant to be enjoyable
 Be a genome explorer
Find out what kind of data is out there
Find out what kind of data isn’t there
QUESTIONS?????
Q1.
 There
is so much data in Genbank.
How do you find GOOD data?
 Example
Bovine rhodopsin - 1st G protein coupled
receptor to be sequenced
Search Genbank with text
49 entries
Bovine opsin
Bovine rhodopsin
Searching for genes
 Searching
by text is fraught with peril
Genbank has too many links
Pull up many things that are not what you
want
 BLAST
is better approach
 NCBI has also made records which
combine all similar sequences into one
NCBI has done some of the
work

They have hand-curated data for some
species to make a set of reference
sequences
Nucleotide sequences NMxxxxxxx
Protein sequences NPxxxxxx
For human rhodopsin
NM000539
NP000530

These are the gold standard for sequences
Homologene
Homologs
 Two
genes which arise in the common
ancestor of two organisms and are
passed down
 Implies genes perform same function in
two organisms
 Therefore they can be compared to
learn about evolution
These 4 primates have
many genes which are
homologs
and have been passed down
from primate ancestor
Human
Chimp
Macaque
Bushbaby
Homologene search for
rhodopsin
Homologene
Three primary sequence
portals: 1. NCBI
3. DNA database of Japan
2. Ensembl - European
Bioinformatics Institute (EBI)
Select just genes
Scroll down to find the gene
you want
Location
Links to transcript and protein
Orthologues are predicted and linked
OMIM - Online mendelian
inheritance in man
Good places to find genes
Model organisms: NCBI homologene
 Genes from models and other organisms:
Sanger Ensembl gene families

NOTE: These are often predicted from genome
sequences
If there is a sequence in NCBI homologene, it may
be different (and more accurate) than Sanger
predictions

OMIM is a good reference
Q2. How do genes change
through time?
 Change
in actual sequence
Mutation
Recombination
 Change
in frequency of a sequence
Selection - “survive” better
Drift - get passed on by chance
Migration - move between populations
Mutation vs selection
 Mutation
= sequence change
ATGCCGTGACGT
ATGCCTTGACGT
 Selection/drift/migration
= sequence
frequency changes across a number of
individuals
ATGTG
ATGTG
ATGTG ATGTG
ATGTG ATGTG
ATGTG
ATGTG ATGTG
ATGTG
ATGTT
ATGTG ATGTG
ATGTG ATGTG

ATGTG
ATGTG
ATGTG ATGTG
ATGTG ATGTT
ATGTT
ATGTT ATGTT
ATGTT
ATGTT
ATGTT ATGTT
ATGTT ATGTT
Evolution as tinkerer
 Changes
are typically small
 Mutation is source of new sequence
Not all mutations are created equal
Some occur more often than others
 Other
forces shift frequency of particular
sequence
Triplet amino acid code
F, phe
F, phe
L, leu
L, leu
TTT
TTC
TTA
TTG
S,
S,
S,
S,
ser
ser
ser
ser
TCT
TCC
TCA
TCG
Y , ty r TAT
Y , ty r TAC
O, stopTAA
B, stop TAG
C, cy s
C, cy s
J, stop
W, trp
TGT
TGC
TGA
TGG
L,
L,
L,
L,
CTT
CTC
CTA
CTG
P,
P,
P,
P,
pro
pro
pro
pro
CCT
CCC
CCA
CCG
H,
H,
Q,
Q,
CAT
CAC
CAA
CAG
R,
R,
R,
R,
arg
arg
arg
arg
CGT
CGC
CGA
CGG
I, ile
I, ile
I, ile
M, met
ATT
ATC
ATA
ATG
T,
T,
T,
T,
thr
thr
thr
thr
ACT
ACC
ACA
ACG
N, asn
N, asn
K, ly s
K, ly s
AAT
AAC
AAA
AAG
S, ser
S, ser
R, arg
R, arg
AGT
AGC
AGA
AGG
V,
V,
V,
V,
GTT
GTC
GTA
GTG
A,
A,
A,
A,
ala
ala
ala
ala
GCT
GCC
GCA
GCG
D, asp
D, asp
E, glu
E, glu
GAT
GAC
GAA
GAG
G,
G,
G,
G,
GGT
GGC
GGA
GGG
leu
leu
leu
leu
v al
v al
v al
v al
his
his
gln
gln
gly
gly
gly
gly
Mutation causes nucleotide
change
 What
about AA sequence?
 Synonymous change
Syn = same
AA stays same
 Nonsynonymous
Not same
AA changes
change
Amino acid code
F, phe
F, phe
L, leu
L, leu
TTT
TTC
TTA
TTG
S,
S,
S,
S,
ser
ser
ser
ser
TCT
TCC
TCA
TCG
Y , ty r TAT
Y , ty r TAC
O, stopTAA
B, stop TAG
C, cy s
C, cy s
J, stop
W, trp
TGT
TGC
TGA
TGG
L,
L,
L,
L,
CTT
CTC
CTA
CTG
P,
P,
P,
P,
pro
pro
pro
pro
CCT
CCC
CCA
CCG
H,
H,
Q,
Q,
CAT
CAC
CAA
CAG
R,
R,
R,
R,
arg
arg
arg
arg
CGT
CGC
CGA
CGG
I, ile
I, ile
I, ile
M, met
ATT
ATC
ATA
ATG
T,
T,
T,
T,
thr
thr
thr
thr
ACT
ACC
ACA
ACG
N, asn
N, asn
K, ly s
K, ly s
AAT
AAC
AAA
AAG
S, ser
S, ser
R, arg
R, arg
AGT
AGC
AGA
AGG
V,
V,
V,
V,
GTT
GTC
GTA
GTG
A,
A,
A,
A,
ala
ala
ala
ala
GCT
GCC
GCA
GCG
D, asp
D, asp
E, glu
E, glu
GAT
GAC
GAA
GAG
G,
G,
G,
G,
GGT
GGC
GGA
GGG
leu
leu
leu
leu
v al
v al
v al
v al
his
his
gln
gln
gly
gly
gly
gly
Amino acid (AA) types
 Non-polar
A, F, G, I, L, M, P, V, W
 Polar
N, Q, S, T, Y
 Charged, + H, K, R
 Charged, D, E
 Other
C
Often changing AA within a group does not
affect protein function
Selection
 Stabilizing
selection - Acts to keep
protein function the same
Synonymous change more frequent than
nonsynonymous
 Amino
acid changes occur within group
much more common than between
Non polar  nonpolar
Polar
 polar
Similarity matrix
A = alanine
C = cysteine
D = aspartic acid
E = glutamic acid
F = phenylalanine
G = glycine
H = histidine
Comparing sequences
 Can
do at either nucleotide or AA level
 Gather sequences from a bunch of
different organisms
 Need to align them so that sites which
perform the same function can be
compared
Aligning sequences
 Sequences
may differ in length
Often have differences at amino- or carboxyterminus of the protein
Need a way to align parts of protein that are
performing the same function
Example - RH2 opsin in fishes
Goldfish MNGTEGNNFYVPLSNR
Medaka
MENGTEGKNFYIPMNNR
Zebrafish MNGTEGSNFYIPMSNR
Killifish MGYGPNGTEGNNFYIPMSNK
TroutMQNGTEGSNFYIPMSNR
Halibut
MVWDGGIEPNGTEGKNFYIPMSNR
Cod
MRMEANGTEGKNFYIPMSNR
Tetraodon MVWDGGIEPNGTEGKNFYIPMSNR
Align sequences
Zebrafish
Trout
Medaka
Cod
Halibut
Tetraodon
Goldfish
Killifish
* identical
: conserved
. semi-conserved
M--------NGTEGSNFYIPMSNR
M------Q-NGTEGSNFYIPMSNR
M------E-NGTEGKNFYIPMNNR
M----RMEANGTEGKNFYIPMSNR
MVWDGGIEPNGTEGKNFYIPMSNR
MVWDGGIEPNGTEGKNFYIPMSNR
M--------NGTEGNNFYVPLSNR
M---GYG-PNGTEGNNFYIPMSNK
*
*****.***:*:.*:
Amino acid (AA) types
 Non-polar
A, F, G, I, L, M, P, V, W
 Polar
N, Q, S, T, Y
 Charged, + H, K, R
 Charged, D, E
 Other
C
Often changing AA within a group does not
affect protein function