Transcript benfey_ch10
Chapter 10
Comparative Genomics
Insights gained through comparison of
genomes from different species
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Contents
History
Synteny
Conservation and function
Sequence similarity searches
Gene finding
Regulatory sequence identification
Interaction mapping
Genes and evolution
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
History
Human Genome Project decided to use smaller
genomes as warm-up for human genome
Resulted in sequencing:
Many bacteria
Model organism genomes
Yeast, C. elegans, Arabidopsis, Drosophila
Comparison of these genome sequences
provided basis for field of “Comparative
Genomics”
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Early comparative genomics
Comparative genomics prior to obtaining full
genome sequence:
Genome size
Compared DNA content among species
Single copy and repetitive DNA
Used hybridization kinetics
Found amount of repetitive DNA differed
greatly among species
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Synteny
Synteny: genes that are in the same relative
position on two different chromosomes
Genetic and physical maps compared between
species
Or between chromosomes of the same species
Closely related species generally have similar
order of genes on chromosomes
Synteny can be used to identify genes in one
species based on map-position in another
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Synteny of Grass genomes
Synteny among crop
genomes: rice, maize
and wheat
Rice is smallest genome
in center
Wheat largest - outer
circle
Genes found in similar
places on chromosomes
are indicated
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Synteny of sequenced genomes
When sequence from mouse and human genomes
compared:
Find regions of remarkable synteny
Genes are in almost identical order for long stretches
along the chromosome
Human
Chr 14
Mouse
Chr 14
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Mouse/human synteny
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Comparing sequenced genomes
Comparison of genomic sequences from
different species can help identify:
Gene structure
Gene function
Regulatory sequences
Interactions between gene products
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Evolution and sequence
conservation
Genome comparisons based on observation:
conservation = function
If no constraints on DNA sequence
Random mutations will occur
Over tens of millions of years these random
mutations will make two related sequences
different
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Function and sequence
conservation
However: if there are constraints:
e.g. DNA codes for protein
Or transcription factor binds DNA
Then there will be sequence similarity when
related sequences compared
Basic rule when comparing two related
sequences:
Sequence conservation = functional
importance
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Orthologs and Paralogs
When comparing sequence from different
genomes
Must distinguish between two types of closely
related sequences:
Orthologs are genes found in two species that
had a common ancestor
Paralogs are genes found in the same species
that were created through gene duplication
events
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Orthologues and Paralogues
A
A’
A’’
B”
B’
B
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Sequence similarity and gene
function
Sequence comparisons that implicate function
are widely used:
To determine if newly sequenced cDNA or
genomic region encodes gene of known
function
Search for similar sequence in other species
(or in same species)
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Homology searches
Search databases of DNA sequences
Use computer algorithms to align sequences
Don’t require perfect matches between
sequences
Allow for insertions, deletions and base changes
Most commonly used algorithms:
BLAST
FAST-A
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Homology search example
The seasquirt, Ciona intestinalis makes a coat
primarily of cellulose
A BLAST search was performed on the Ciona
genome using an Arabidopsis endoglucanase
gene involved in cellulose synthesis
Extensive homology was found with a Ciona
gene flanked by genes found in Drosophila and
human
It is postulated that the Ciona endoglucanase
gene may have arisen by lateral gene transfer
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Discovery of endoglucanase gene
in Seasquirt genome
Arabidopsis Korrigan
Transporter
Endoglucanase
Splicing factor
C. intestinalis cDNA
C. elegans and Drosophila
Human
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Homology search for the mouse
genome
Homology search of all
genes in the mouse
genome :
27% in other metazoans
29% in other eukaryotes
6% in other chordates
14 % in other mammals
Less than 1% rodent
specific
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Problems of Genome annotation
Identifying genes and regulatory regions in
sequenced genomes is challenging
Open reading frames (ORFs) are usually good
indication of genes
Problem is: difficult to determine which ORFs
belong to a gene
Many mammalian genes have small exons and
large introns
Regulatory sequences even more difficult
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Computational approaches to
gene identification
Computer programs analyze genomic sequence
GRAIL, GeneFinder
Look for ORFs, splice sites, poly A addition
sites etc.
Predict gene structure
Frequently wrong
Usually miss exons at beginning or end of gene
Or predict exon when doesn’t really exist
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
How genome comparisons help
When comparing genomes of different species
Genes normally have same exon/intron
structure
Look for conserved ORFs in both genomes
Frequently permits accurate identification of
genes
Fugu/human comparison found >1000 genes
Mouse/human comparison indicates only
30,000 genes in genome
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Sequence comparison example
Comparison of the human and mouse spermidine
synthase genes
Revealed an additional intron in the human gene that is
not found in the mouse homologue
Human
Mouse
5,500 bp
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Identifying small RNAs
Growing evidence that
small RNAs can
regulate gene expression
Small RNAs are 20-25
bases
Conservation between
genomes suggests
functionality
Example:Small RNAs
conserved in
Arabidopsis and rice
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Regulatory sequence
identification
A large portion of the genome contains
regulatory information
Regulatory sequence includes:
Cis-regulatory elements: tell genes when and
where to turn on
Basal transcription machinery binding sites
Enhancers
Can be 5’ of gene, 3’ of gene or in intron
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Regulatory sequences
5’
TATA
3’
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Finding regulatory sequences
Regulatory sequences are difficult to identify
using computer programs
Problem is: most enhancer sequences have yet
to be identified
They are usually short: 6-10 basepairs
Those that are known are usually degenerate
They can differ in one or more basepairs
Still bind the cognate transcription factor
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Comparisons to identify
regulatory elements
Comparisons of genomes of different species
can identify regulatory elements
Change in intergenic regions and introns
usually more rapid than in coding regions
Nevertheless, regulatory elements tend to be
conserved
Conserved regions called “phylogenetic
footprint”
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Phylogenetic footprint
To identify conserved regulatory regions
usually requires comparing genomes of closely
related species
If too distantly related, very difficult to find
conservation
Nevertheless, mouse/human sequence
comparison has revealed many conserved cisregulatory elements
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Mouse/human comparison
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Using multiple species for
Phylogenetic footprinting
The location of regulatory sequences can also
be found comparing several related sequences
Multiple alignments performed
Better able to home in on important regions
Conservation alone not enough, need to
validate importance of elements
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Interaction mapping
Protein-protein interactions include:
The transfer of information in a genetic
pathway
Scaffolding to tether other proteins
Enzymatic reactions
Large molecular machines such as motors
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Rosetta Stone
Observation: in some species, interaction
proteins encoded by single gene
In other species same proteins encoded in two
genes
Systematic search through sequenced genomes
for these relationships should identify proteins
that interact
Called “Rosetta Stone” approach
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Rosetta Stone example
Equivalent of yeast
protein topoisomerase II
In E. coli two proteins:
gyrase A and gyrase B
Suggests gyrase B and
gyrase A interact
Yeast
topoisomerase II
E. coli
gyrase B
gyrase A
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Rosetta stone
Escherichia coli
Haemophilus influenzae
Methanococcus jannaschii
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Higher level comparisons
Comparisons between genomes not just to
better identify genes and regulatory sequences
Evolution of adaptive traits occurs through:
Evolution of new genes
Changing when and where genes express
Thus comparisons of genes found in genome
can provide information about mechanisms of
evolution
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Genes and genomes
Comparison of total gene numbers in
sequenced genomes:
Smaller than originally expected
Ex: Human genome thought to have 100,000
genes
Now think closer to 30-35,000 genes
Suggests that many new functions arise in gene
expression
Use old genes in new ways
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Selective expansion of genes
Although comparisons show not as much
difference in numbers of genes as expected
Still see striking differences in numbers of
some gene families
Example:
Roundworm C. elegans has a large number of
nuclear receptor genes
Drosophila has large number of zinc-finger
transcription factors
Plants have no G-protein coupled receptors
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
What is difference between man
and ape?
Man and chimpanzee
have a genome wide
similarity of greater than
95%.
What accounts for
differences in species?.
Recent study suggests
due to specific gene
expression differences.
Striking differences
found only in brain
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Human/ape gene expression
comparisons
1.3
Human
1.0
Chimp
Human
Chimp
Human
5.5
Chimp
Rhesus
Rhesus
Rhesus
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Trait-to-gene
Methods being developed to identify genes
involved in adaptive traits
Example: “Trait-to-gene”
Underlying reasoning:
Organisms that have a particular trait either
share related genes
Or have developed new genes to perform same
function
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Relating traits to genes
Species 1
Species 2
Trait A
Trait A
Gene
Gene
Species 3
Trait A
Gene
Species 4
Species 5
Trait A
Gene
COG 3
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Trait-to-gene
Comparisons made of bacterial genomes
Need many genomes
Looked for genes involved in flagellar function
Identified 43 of 45 known genes
Found 5 additional genes that program said
should be involved in flagella function
Knocked out 3 and found that 2 resulted in
bacteria with defective flagella
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Trait-to-gene
B. subtilis 168
yqeW
yuxH
B. subtilis 168
Overnight growth at 37°C. Swim medium (LB + 0.25% agar).
Similar results at 20°C (4 days) and 30°C (2 days).
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
The goal of comparative genomics
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Summary
Synteny = similar relative positions of genes
on chromosomes
Conservation = function
Homology searches
Gene structure prediction
Regulatory sequence identification
Interaction mapping
Genes and evolution
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458