Gen677_Lecture1_Intro

Download Report

Transcript Gen677_Lecture1_Intro

1
Evolutionary genomics can now be applied beyond ‘model’ organisms
Technological advances brings genomics to the study of ecology & evolution
But genomics has also made apparent the need for
incorporating evolution into basic biological study
Knowing how and why characters evolve (e.g. which residues of a protein
are under constraint) informs on function (e.g. which residues are important)
2
Types of questions & comparisons in evolutionary genomics
Given several whole-genome sequences, we can compare:
* Genome size, organization (chromosomes/plasmids), structure
* Gene/ncRNA content: number of genes, duplicates, size of gene families, etc
* Sequence differences related to: gene evolution, regulatory evolution
* RNA & protein abundance across species, for all RNAs/proteins
Ultimately, many of us are interested in genomic features under selection:
* Which genomic features are restricted from varying? Why?
* Which genomic features are least restricted from varying? Why?
* Which genomic features are/were involved in adaptation?
3
The How and the Why of Evolution
The HOW:
• We can observe what characters are different within and between species
• Using phylogeny, we can often reconstruct the common ancestral state
Together these can inform on the history of changes
The WHY: often much more challenging
• The goal is to understand the forces that drive changes or restrict change
• Often this means looking for known signatures of evolution (e.g. selection)
4
“Nothing in Biology Makes Sense Except in the Light of Evolution”
T. Dobzhansky
‘Nothing in [comparative genomics] Makes Sense Except
in the Light of the [phylogenetic tree]’
A. Gasch bastardization
5
Primer on Phylogeny
Cladogram
Phylogram
a
a
b
b
c
c
d
d
e
e
Shows structure of the tree only
Shows structure AND distance
between nodes
6
Primer on Phylogeny
3 incarnations of the SAME unrooted tree
a
a
b
c
d
e
a
b
b
c
c
d
d
e
e
Unrooted means do not know history (i.e. where the common ancestor is)
7
Primer on Phylogeny
3 incarnations of the SAME rooted tree
a
a
b
c
d
e
z
a
b
b
c
c
*
d
*
e
z
e
z known outgroup
d
*
Rooted means DO know history (i.e. where the common ancestor * is)
Now distance also corresponds to ‘time’ (molecular clock theory)
or order of events
8
Reconstructing the Ancestral State
a
b
c
d
*
e
z known outgroup
9
Reconstructing the Ancestral State
a
Speciation event
b
c
d
*
e
z known outgroup
In a species tree, each bifurcation represents a species split
10
Reconstructing the Ancestral State
4
b
3
c
2
1
a
5
*
d
e
z known outgroup
Time/Distance
Given a tree, we can reconstruct the ancestral state at each node.
Usually work by Parsimony = smallest number of changes to explain the tree
(i.e. the simplest explanation)
11
Reconstructing the Ancestral State
Species can utilize:
4
3
2
1
5
*
a
Glucose, Galactose, Lactose
b
Glucose, Galactose, Lactose
c
Glucose, Galactose, Lactose, Fructose
d
Glucose, Galactose,
Fructose
e
Glucose, Galactose,
Fructose
z
Glucose,
Lactose, Fructose
Time/Distance
Given a tree, we can reconstruct the ancestral state at each node.
Usually work by Parsimony = smallest number of changes to explain the tree
12
Reconstructing the Ancestral State
Species can utilize:
GGL
GGLF
GGLF
GGF
GGLF or GLF
*
a
Glucose, Galactose, Lactose
b
Glucose, Galactose, Lactose
c
Glucose, Galactose, Lactose, Fructose
d
Glucose, Galactose,
Fructose
e
Glucose, Galactose,
Fructose
z
Glucose,
Lactose, Fructose
Time/Distance
Events:
1.
2.
3.
4.
5.
Last common ancestor could have been either GGLF or GLF
Galactose utilization might have been gained
No change in states
Fructose utilization was lost
Lactose utilization was lost
13
Making inferences based on phylogeny:
Confidence in your inferences depends on how much you trust your tree
Methods of phylogeny construction:
* Neighbor joining: organize species based on similarity score
- computationally the simplest but can be misleading
especially if species are of variable evolutionary distances
(“variable branch lengths”)
* Parsimony: simplest tree to explain the observed data
- simplest to model, can be computationally intensive without ‘heuristics’
* Maximum likelihood: requires specific models of evolution
- computationally very intensive, need specific models
* Baysian:
- can be computationally intensive, need specific models
** Methods of phylogeny construction are beyond this course, but there are
14
several excellent courses on campus that cover this
Making inferences based on phylogeny:
Confidence in your inferences depends on how much you trust your tree
Most methods have some way of assessing confidence at each node
1
b
0.6
1
*
Bootstrapping: Remake the tree 1,000 times
using a subset of the data and see how many
times you get the same node.
c
0.4
1
a
d
High bootstrap value (>0.6) means in 60%
of remade trees you observe that node.
e
z
bootstrap values
15
Making inferences based on phylogeny:
Confidence in your inferences depends on how much you trust your tree
Most methods have some way of assessing confidence at each node
Often represent the consensus tree
1
0.6
0.4
1
1
*
a
a
b
b
c
c
d
d
e
e
z
z
bootstrap values
Collapse nodes without high confidence
16
Making inferences based on phylogeny:
Confidence in your inferences depends on how much you trust your tree
Most methods have some way of assessing confidence at each node
1
b
0.6
Baysian methods use a different approach
Posterior Probability:
c
0.4
1
a
1
*
d
Typically don’t trust nodes with <90% posterior
probability.
e
z
posterior probabilities
17
Making inferences based on phylogeny:
Confidence in your inferences depends on how much you trust your tree
Most methods have some way of assessing confidence at each node
Consensus tree
1
0.6
0.4
1
1
*
a
a
b
b
c
c
d
d
e
e
z
z
posterior probabilities
Collapse nodes with <0.9 posterior prob.
18
Species tree vs. gene/protein tree
Trees can be very different, since genes can have their own histories
Very important to know the difference between the trees!
a. Gene tree is based a set of orthologous genes (i.e. related by a common ancestor)
Often (but certainly not always) the gene tree is similar to the species tree
b. Species tree is meant to represent the historical relationship between species.
Want to build on characters that reflect time since divergence:
In the genomic age, often use as many genes as possible (hundreds to thousands)
to generate a species tree: Phylogenomics
19
Phylogenomics: Using Whole-genome information to reconstruct
the Tree of Life
Several approaches:
1. Concatonate many gene sequences and treat as one
Use a ‘super matrix’ of variable sequence characters
2. Construct many separate trees, one for each gene, and then compare
Often construct a ‘super tree’ that is built from all single trees
3. Incorporate non-sequence characters like synteny, intron structure, etc.
The goal is to use many different # and types of characters
to avoid being mislead about the
relationship between species.
Now recognized that different regions of the genome can have distinct histories.
20
A few other key basic concepts:
Selection acts on phenotypes, based on their fitness cost/advantage, to affect
the population frequencies of the underlying genotypes.
In the case of DNA sequence:
• Neutral substitutions = no effect on fitness, no effect on selection
Given a ~constant mutation rate, can convert the # of substitutions into
time of divergence since speciation = molecular clock theory.
• Deleterious substitutions = fitness cost
* These are removed by purifying (negative) selection
• Advantageous substitutions = fitness advantage
* These alleles are enriched for through adaptive (positive) selection
21