Transcript benfey_ch10

Chapter 10
Comparative Genomics
Insights gained through comparison of
genomes from different species
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Contents








History
Synteny
Conservation and function
Sequence similarity searches
Gene finding
Regulatory sequence identification
Interaction mapping
Genes and evolution
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
History
 Human Genome Project decided to use smaller
genomes as warm-up for human genome
 Resulted in sequencing:
 Many bacteria
 Model organism genomes
 Yeast, C. elegans, Arabidopsis, Drosophila
Comparison of these genome sequences
provided basis for field of “Comparative
Genomics”
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Early comparative genomics
 Comparative genomics prior to obtaining full
genome sequence:
 Genome size
 Compared DNA content among species
 Single copy and repetitive DNA
 Used hybridization kinetics
 Found amount of repetitive DNA differed
greatly among species
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Synteny
 Synteny: genes that are in the same relative
position on two different chromosomes
 Genetic and physical maps compared between
species
 Or between chromosomes of the same species
 Closely related species generally have similar
order of genes on chromosomes
 Synteny can be used to identify genes in one
species based on map-position in another
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Synteny of Grass genomes
 Synteny among crop
genomes: rice, maize
and wheat
 Rice is smallest genome
in center
 Wheat largest - outer
circle
 Genes found in similar
places on chromosomes
are indicated
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Synteny of sequenced genomes
 When sequence from mouse and human genomes
compared:
 Find regions of remarkable synteny
 Genes are in almost identical order for long stretches
along the chromosome
Human
Chr 14
Mouse
Chr 14
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Mouse/human synteny
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Comparing sequenced genomes
 Comparison of genomic sequences from
different species can help identify:




Gene structure
Gene function
Regulatory sequences
Interactions between gene products
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Evolution and sequence
conservation
 Genome comparisons based on observation:
conservation = function
 If no constraints on DNA sequence
 Random mutations will occur
 Over tens of millions of years these random
mutations will make two related sequences
different
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Function and sequence
conservation
 However: if there are constraints:
 e.g. DNA codes for protein
 Or transcription factor binds DNA
 Then there will be sequence similarity when
related sequences compared
 Basic rule when comparing two related
sequences:
 Sequence conservation = functional
importance
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Orthologs and Paralogs
 When comparing sequence from different
genomes
 Must distinguish between two types of closely
related sequences:
 Orthologs are genes found in two species that
had a common ancestor
 Paralogs are genes found in the same species
that were created through gene duplication
events
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Orthologues and Paralogues
A
A’
A’’
B”
B’
B
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Sequence similarity and gene
function
 Sequence comparisons that implicate function
are widely used:
 To determine if newly sequenced cDNA or
genomic region encodes gene of known
function
 Search for similar sequence in other species
(or in same species)
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Homology searches
 Search databases of DNA sequences
 Use computer algorithms to align sequences
 Don’t require perfect matches between
sequences
 Allow for insertions, deletions and base changes
 Most commonly used algorithms:
 BLAST
 FAST-A
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Homology search example
 The seasquirt, Ciona intestinalis makes a coat
primarily of cellulose
 A BLAST search was performed on the Ciona
genome using an Arabidopsis endoglucanase
gene involved in cellulose synthesis
 Extensive homology was found with a Ciona
gene flanked by genes found in Drosophila and
human
 It is postulated that the Ciona endoglucanase
gene may have arisen by lateral gene transfer
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Discovery of endoglucanase gene
in Seasquirt genome
Arabidopsis Korrigan
Transporter
Endoglucanase
Splicing factor
C. intestinalis cDNA
C. elegans and Drosophila
Human
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Homology search for the mouse
genome
 Homology search of all
genes in the mouse
genome :
 27% in other metazoans
 29% in other eukaryotes
 6% in other chordates
 14 % in other mammals
 Less than 1% rodent
specific
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Problems of Genome annotation
 Identifying genes and regulatory regions in
sequenced genomes is challenging
 Open reading frames (ORFs) are usually good
indication of genes
 Problem is: difficult to determine which ORFs
belong to a gene
 Many mammalian genes have small exons and
large introns
 Regulatory sequences even more difficult
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Computational approaches to
gene identification
 Computer programs analyze genomic sequence
 GRAIL, GeneFinder
 Look for ORFs, splice sites, poly A addition
sites etc.
 Predict gene structure
 Frequently wrong
 Usually miss exons at beginning or end of gene
 Or predict exon when doesn’t really exist
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
How genome comparisons help
 When comparing genomes of different species
 Genes normally have same exon/intron
structure
 Look for conserved ORFs in both genomes
 Frequently permits accurate identification of
genes
 Fugu/human comparison found >1000 genes
 Mouse/human comparison indicates only
30,000 genes in genome
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Sequence comparison example
 Comparison of the human and mouse spermidine
synthase genes
 Revealed an additional intron in the human gene that is
not found in the mouse homologue
Human
Mouse
5,500 bp
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Identifying small RNAs
 Growing evidence that
small RNAs can
regulate gene expression
 Small RNAs are 20-25
bases
 Conservation between
genomes suggests
functionality
 Example:Small RNAs
conserved in
Arabidopsis and rice
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Regulatory sequence
identification
 A large portion of the genome contains
regulatory information
 Regulatory sequence includes:
 Cis-regulatory elements: tell genes when and
where to turn on
 Basal transcription machinery binding sites
 Enhancers
 Can be 5’ of gene, 3’ of gene or in intron
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Regulatory sequences
5’
TATA
3’
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Finding regulatory sequences
 Regulatory sequences are difficult to identify
using computer programs
 Problem is: most enhancer sequences have yet
to be identified
 They are usually short: 6-10 basepairs
 Those that are known are usually degenerate
 They can differ in one or more basepairs
 Still bind the cognate transcription factor
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Comparisons to identify
regulatory elements
 Comparisons of genomes of different species
can identify regulatory elements
 Change in intergenic regions and introns
usually more rapid than in coding regions
 Nevertheless, regulatory elements tend to be
conserved
 Conserved regions called “phylogenetic
footprint”
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Phylogenetic footprint
 To identify conserved regulatory regions
usually requires comparing genomes of closely
related species
 If too distantly related, very difficult to find
conservation
 Nevertheless, mouse/human sequence
comparison has revealed many conserved cisregulatory elements
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Mouse/human comparison
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Using multiple species for
Phylogenetic footprinting
 The location of regulatory sequences can also
be found comparing several related sequences
 Multiple alignments performed
 Better able to home in on important regions
 Conservation alone not enough, need to
validate importance of elements
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Interaction mapping
 Protein-protein interactions include:
 The transfer of information in a genetic
pathway
 Scaffolding to tether other proteins
 Enzymatic reactions
 Large molecular machines such as motors
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Rosetta Stone
 Observation: in some species, interaction
proteins encoded by single gene
 In other species same proteins encoded in two
genes
 Systematic search through sequenced genomes
for these relationships should identify proteins
that interact
 Called “Rosetta Stone” approach
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Rosetta Stone example
 Equivalent of yeast
protein topoisomerase II
 In E. coli two proteins:
gyrase A and gyrase B
 Suggests gyrase B and
gyrase A interact
Yeast
topoisomerase II
E. coli
gyrase B
gyrase A
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Rosetta stone
Escherichia coli
Haemophilus influenzae
Methanococcus jannaschii
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Higher level comparisons
 Comparisons between genomes not just to
better identify genes and regulatory sequences
 Evolution of adaptive traits occurs through:
 Evolution of new genes
 Changing when and where genes express
 Thus comparisons of genes found in genome
can provide information about mechanisms of
evolution
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Genes and genomes
 Comparison of total gene numbers in
sequenced genomes:
 Smaller than originally expected
 Ex: Human genome thought to have 100,000
genes
 Now think closer to 30-35,000 genes
 Suggests that many new functions arise in gene
expression
 Use old genes in new ways
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Selective expansion of genes
 Although comparisons show not as much
difference in numbers of genes as expected
 Still see striking differences in numbers of
some gene families
 Example:
 Roundworm C. elegans has a large number of
nuclear receptor genes
 Drosophila has large number of zinc-finger
transcription factors
 Plants have no G-protein coupled receptors
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
What is difference between man
and ape?
 Man and chimpanzee
have a genome wide
similarity of greater than
95%.
 What accounts for
differences in species?.
 Recent study suggests
due to specific gene
expression differences.
 Striking differences
found only in brain
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Human/ape gene expression
comparisons
1.3
Human
1.0
Chimp
Human
Chimp
Human
5.5
Chimp
Rhesus
Rhesus
Rhesus
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Trait-to-gene
 Methods being developed to identify genes
involved in adaptive traits
 Example: “Trait-to-gene”
 Underlying reasoning:
 Organisms that have a particular trait either
share related genes
 Or have developed new genes to perform same
function
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Relating traits to genes
Species 1
Species 2
Trait A
Trait A
Gene
Gene
Species 3
Trait A
Gene
Species 4
Species 5
Trait A
Gene
COG 3
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Trait-to-gene
 Comparisons made of bacterial genomes
 Need many genomes
 Looked for genes involved in flagellar function
 Identified 43 of 45 known genes
 Found 5 additional genes that program said
should be involved in flagella function
 Knocked out 3 and found that 2 resulted in
bacteria with defective flagella
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Trait-to-gene
B. subtilis 168
yqeW
yuxH
B. subtilis 168
Overnight growth at 37°C. Swim medium (LB + 0.25% agar).
Similar results at 20°C (4 days) and 30°C (2 days).
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
The goal of comparative genomics
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Summary
 Synteny = similar relative positions of genes
on chromosomes
 Conservation = function
 Homology searches
 Gene structure prediction
 Regulatory sequence identification
 Interaction mapping
 Genes and evolution
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458