Comprehensive Whole Genome Bacterial Phylogeny

Download Report

Transcript Comprehensive Whole Genome Bacterial Phylogeny

An SVD-Based Phylogenetic Method
for Creating Comprehensive
Whole Genomes
Michael W. Berry
University of Tennessee, Knoxville
[email protected]
Gary W. Stuart
Indiana State University
[email protected]
Outline
• Statement of Problem
• SVD-based Approach
• Consensus Trees
• Correlated Peptide Motifs
•
•
•
•
•
•
Benchmarks
n-gram Issues
Alternative Encodings (Factorizations)
Model Observations and Feedback
Collaborative Software Infrastructure
References
7/21/2015 11:18 AM
Berry and Stuart 2003
2
Problem
• Comprehensive analysis (character-bycharacter) of complete genomes using
standard phylogenetic methods not
tractable.
• Standard approaches: parsimony,
maximum likelihood, Bayesian inference
• Potential misleading similarities due to
stochastic (neutral) evolution,
convergent evolution, or horizontal
gene transfer.
7/21/2015 11:18 AM
Berry and Stuart 2003
3
Problem
• Need for determining the similarity of
species based on the similarity of
characters (and their combinations)
within genomic sequences.
• Approach: automatic generation of
phylogenetic trees from protein
sequences.
• Goal: expose (ancestral) relationships
of corresponding species.
7/21/2015 11:18 AM
Berry and Stuart 2003
4
Overview of Approach
• Construct (sparse) tetrapeptide-byprotein matrix from whole
bacterial/mitochondrial genomes
• Factor matrix via truncated SVD (Lanczos
method) to obtain vector representations
of peptides and proteins in high-dim.
space.
• Use cosines to derive pairwise similarities
of protein vectors for gene tree
generation; compare species vectors also
7/21/2015 11:18 AM
Berry and Stuart 2003
5
Details of Approach
• 53 whole bacterial + 1 mitochondrial
genomes downloaded from Nat. Center
for Biotechnology Info. (NCBI); yielded
134,155 protein sequences
• The frequency of 160,000 possible
tetrapeptides from each protein was
recorded and stored in a sparse columncompressed matrix format.
• p-LANSO code (C++/MPI) used to obtain
a 571-dim. dominant singular subspace
for the matrix (in 1,950 iterations).
7/21/2015 11:18 AM
Berry and Stuart 2003
6
Details of Approach
• Storage of peptide and protein vectors
may be a concern (1.3 GB for 571 dim.
subspace)
• Protein similarities were computed via
COSDIST (written in C) using a complete
range of dominant (right) singular
vectors:
A= UkΣkVkT, where k=10,…,571
• Phylogenetic trees are generated by
applying NEIGHBOR program (PHYLIP
suite) to resulting distance matrices.
7/21/2015 11:18 AM
Berry and Stuart 2003
7
Details of Approach
• Complete set of 562 trees constructed.
• Consensus tree created using subspaces
of dim. 471 and higher; this C-tree
produced 7 clusters of bacteria in which
only 1 cluster was represented by less
than 50% of the trees (100 in total).
• Correlated peptide (copep) motifs were
constructed by identifying dominant
tetrapeptide components in the left
singular vectors (columns of Uk).
7/21/2015 11:18 AM
Berry and Stuart 2003
8
Consensus Tree (100 dims)
100
100
100
100
100
100
100
61
V472 – V571
100
100
100
97
96
100
100
100
100
100
100
100
100
57
24
52
100
100
misclassified
52
42
100
100
100
33
100
33
86
25
79
100
17
100
65
21
22
18
61
44
50
69
79
47
40
7/21/2015 11:18 AM
100
100
100
100
100
100
100
100
100
100
100
100
100
100
86
100
100
100
100
100
100
100
100
96
100
41
100
100
49
100
100
100
100
100
100
100
100
100
100
100
69
200
100
100
100
100
69
100
100
100
100
86
100
100
Berry and Stuart 2003
Linn
Lm on
Bs ub
Bhal
Spne
Spyo
Llac
Saur
Styp
Ecol
Vcho
HinF
Pm ul
Ypes
Nm en
Ss pp
Mgen
Mpne
Ctra
Cm ur
Cpne
Tpal
Xfas
Mtub
Mlep
Hnrc
Sm el
Atum
Mlot
Ccre
Hpyl
Cjej
Mpul
Bbur
Uure
Baps
Recl
Rcon
Rpro
Paer
Drad
Cace
Mjan
Paby
Phor
Aful
Tm ar
Aaeo
Stok
Ss ol
Taci
Tvol
Mthe
Aper
G
low GC
gram-positive
bacteria
P
beta/gamma
proteobacteria
subgroup
C
B
A
chlamydias,
spirochaetes
alpha/epsilon
proteobacteria
subgroup
M
T
thermophilic
archaebacteria
9
Consensus Tree Convergence
(53 genome bacterial dataset)
7/21/2015 11:18 AM
Berry and Stuart 2003
10
200
200
200
200
200
200
192
200
200
200
200
200
200
200
200
200
79 prokaryotes
Sept. 2003
167
200
152
152
200
196
187
200
200
200
183
173
200
200
200
200
200
183
200
200
200
200
200
200
200
200
200
200
177
200
177
200
bAZm3len187-386
200
200
200
200
200
7/21/2015 11:18 AM
Mtuh
Scoe
Mlep
Ceff
Cglu
Blon
Mlot
Ccre
Smel
Bmel
Bsui
Ctep
Xcam
Xaxo
Rsol
Drad
Paer
Pput
Psyr
Xfas
Nmez
Ecok
Styl
Ypes
Sfle
Vcho
Vvul
Vpar
Sone
Sspp
Telo
Npcc
Hinf
Pmul
Lpla
Spyr
Smut
Sagm
Spnt
Bhal
Bsub
Oihe
Linn
Lmon
Saum
Sepi
Llac
Fnuc
Ctet
Cper
Cace
Uure
Mpen
Mpul
Cjej
Hpy2
Ctra
Cmur
Cpnc
Lint
Tpal
Twhi
Phor
Paby
Pfur
Aaeo
Tmar
Aful
Mkan
Paem
Aper
Mthe
Stok
Ssol
Tten
Tvol
Taci
Mmaz
Mace
200
200
200
200
200
200
200
200
200
170
200
200
200
200
200
200
200
200
171
189
169
200
200
200
200
200
165
169
154
154
200
200
200
200
200
200
200
200
200
200
200
186
200
200
200
200
200
200
198
200
200
200
200
Berry and Stuart 2003
200
200
200
200
Mtuh
Scoe
Mlep
Ceff
Cglu
Blon
Mlot
Ccre
Smel
Bmel
Bsui
Ctep
Xcam
Xaxo
Rsol
Drad
Paer
Pput
Psyr
Xfas
Nmez
Ecok
Styl
Ypes
Sfle
Vcho
Vvul
Vpar
Sone
Sspp
Telo
Npcc
Hinf
Pmul
Lpla
Spyr
Smut
Sagm
Spnt
Bhal
Oihe
Bsub
Linn
Lmon
Saum
Sepi
Llac
Fnuc
Ctet
Cper
Cace
Mpul
;Mpen
Uure
Cjej
Hpy2
Ctra
Cmur
Cpnc
Lint
Tpal
Twhi
Phor
Paby
Pfur
Aaeo
Tmar
Aful
Mkan
Paem
Aper
Mthe
Stok
Ssol
Tten
Tvol
Taci
Mmaz
Mace
Actinobacteridae
Alphaproteobacteria
Xanthamonadaceae+Pseudomonas
Enterobacteriaceae
Vibrio
Cyanobacteria
Pasteurellaceae
Streptococcaceae
Bacillaceae
Clostridia
Mycoplasmataceae
Campylobacterales
Chlamydiaceae
Spirochaetales
thermophilic
bAZm3lex240-439
11
Dimensionality: bAZm3len
Tree Convergence with Increased Dimensions: len
140
120
Distance to Consensus Tree
Distance to Adjacent Tree
Symmetric Distance
100
80
60
40
20
384
373
362
351
340
329
318
307
296
285
274
263
252
241
230
219
208
197
186
175
164
153
142
131
120
109
98
87
76
65
54
43
32
21
10
0
Included Dimensions
7/21/2015 11:18 AM
Berry and Stuart 2003
12
Details of Approach
• Using a threshold (e.g., 0.025),
tetrapeptides were selected from a
chosen left singular vector and then
aligned (matching 3 out of 4 amino acids
between peptides).
• Largest contiguous string of peptides was
then formulated as a BLAST query to
identify example proteins or matched to
a dominant protein family member.
7/21/2015 11:18 AM
Berry and Stuart 2003
13
Details of Approach
• Sample Copep Motif Definition
(4-gram model of 160,000 peptides and
136,447 proteins; threshold=0.025)
U90 left singular vector …
Peptide chain 9 (start = TLSG):
TLSG 0.03328761
LSGG 0.02878336
SGGE 0.02962949
GGEA 0.02548466
GEAQ 0.02741430
EAQR 0.03030539
TLSGGEAQR
7/21/2015 11:18 AM
Berry and Stuart 2003
14
More on Copep Motifs
• Proteins grouped into families based on
sequence similarity; sequences shared
within a family can be called motifs.
• Identify peptides that appear very
frequently together in the same proteins.
• Co-occurrence of rare peptides reflects
common ancestry or functional
requirements of the protein family.
• Search for correlated peptides in a
contiguous string – similar to BLAST
(Basic Local Alignment Search Tool) but
ignore gaps.
7/21/2015 11:18 AM
Berry and Stuart 2003
15
Copep Motifs Identified
A copep 45: Elongation factor TU chain A (gi 4699821 E. coli)
U45
U46
SKEKFERTKPHVNVGTIGHVDHGKTTLTAAITTVLAKTYGGAARAFDQIDNAPEEKARGITINTSHVEYDTPTRHYAHVDCPGHADYVKNMITGAAQMDGA
ILVVAATDGPMPQTREHILLGRQVGVPYIIVFLNKCDMVDDEELLELVEMEVRELLSQYDFPGDDTPIVRGSALKALEGDAEWEAKILELAGFLDSYIPEP
ERAIDKPFLLPIEDVFSISGRGTVVTGRVERGIIKVGEEVEIVGIKETQKSTCTGVEMFRKLLDEGRAGENVGVLLRGIKREEIERGQVLAKPGTIKPHTK
FESEVYILSKDEGGRHTPFFKGYRPQFYFRTTDVTGTIELPEGVEMVMPGDNIKMVVTLIHPIAMDDGLRFAIREGGRTVGAGVVAKVLS
B copep 46: ABC excinuclease uvrA (gi 1573215 H. influenzae)
MENIDIRGARTHNLKNINLTIPRNKLVVITGLSGSGKSSLAFDTLYAEGQRRYVESLSAYARQFLSLMEKPDVDSIEGLSPAISIEQKSTSHNPRSTVGTI
TEIYDYLRLLFARVGEPRCPDHNVPLTAQTISQMVDKVLSLPEDSKMMLLAPVVKNRKGEHVKILENIAAQGYIRARIDGEICDLSDPPKLALQKKHTIEV
VVDRFKVRSDLATRLAESFETALELSGGTAIVAEMDNPKAEELVFSANFACPHCGYSVPELEPRLFSFNNPAGACPTCDGLGVQQYFDEDRVVQNPTISLA
GGAVKGWDRRNFYYYQMLTSLAKHYHFDVEAPYESLPKKIQHIIMHGSGKEEIEFQYMNDRGDVVIRKHPFEGILNNMARRYKETESMSVREELAKNISNR
PCIDCGGSRLRPEARNVYIGRTNLPIIAEKSIGETLEFFTALSLTGQKAQIAEKILKEIRERLQFLVNVGLNYLSLSRSAETLSGGEAQRIRLASQIGAGL
VGVMYVLDEPSIGLHQRDNERLLNTLIHLRNLGNTVIVVEHDEDAIRAADHIIDIGPGAGVHGGQVIAQGNADEIMLNPNSITGKFLSGADKIEIPKKRTA
LDKKKWLKLKGASGNNLKNVNLDIPVGLFTCVTGVSGSGKSTLINDTLFPLAQNALNRAEKTDYAPYQSIEGLEHFDKVIDINQSPIGRTPRSNPATYTGL
FTPIRELFAGVPEARARGYNPGRFSFNVRGGRCEACQGDGVLKVEMHFLPDVYVPCDQCKGKRYNRETLEIRYKGKTIHQVLDMTVEEAREFFDAIPMIAR
KLQTLMDVGLSYIRLGQSSTTLSGGEAQRVKLATELSKRDTGKTLYILDEPTTGLHFADIKQLLEVLHRLRDQGNTIVVIEHNLDVIKTADWIVDLGPEGG
SGGGQIIATGTPEQVAKVTSSHTARFLKPILEKP
C copep 55: ABC transporter YLIA (gi 9978033 E. coli)
U55
MKKGTPLPHSDELDAGNVLAVENLNIAFMQDQQKIAAVRNLSFSLQRGETLAIVGESGSGKSVTALALMRLLEQAGGLVQCDKMLLQRRSREVIELSEQNA
AQMRHVRGADMAMIFQEPMTSLNPVFTVGEQIAESIRLHQNASREEAMVEAKRMLDQVRIPEAQTILSRYPHQLSGGMRQRVMIAMALSCRPAVLIADEPT
TALDVTIQAQILQLIKVLQKEMSMGVIFITHDMGVVAEIADRVLVMYQGEAVETGTVEQIFHAPQHPYTRALLAAVPQLGAMKGLDYPRRFPLISLEHPAK
QAPPIEQKTVVDGEPVLRVRNLVTRFPLRSGLLNRVTREVHAVEKVSFDLWPGETLSLVGESGSGKSTTGRALLRLVESQGGEIIFNGQRIDTLSPGKLQA
LRRDIQFIFQDPYASLDPRQTIGDSIIEPLRVHGLLPGKDAAARVAWLLERVGLLPEHAWRYPHEFSGGQRQRICIARALALNPKVIIADEAVSALDVSIR
GQIINLLLDLQRDFGIAYLFISHDMAVVERISHRVAVMYLGQIVEIGPRRAVFENPQHPYTRKLLAAVPVAEPSRQRPQRVLLSDDLPSNIHLRGEEVAAV
SLQCVGPGHYVAQPQSEYAFMRR
Perfect matches between the dominant tetrapeptides and the chosen
family member are shown in bold. Matches to the “Walker A” box (8
aa’s) and “C” motifs (12 aa’s) of the NBF’s of these ABC proteins (B and
C) are underlined.
7/21/2015 11:18 AM
Berry and Stuart 2003
16
SVD Benchmarks
• Environment: Sun Enterprise 4500 SMP (14 400MHz
UltraSPARC nodes, 10GB RAM, 0.5TB Disk); serial
execution, 1950 Lanczos iter., residual errors ||ri||2 
10-8
Ngram
Peptides Proteins
Factors Mult(A,A’)
Time
(h:m:s)
3
203=
8,000
134,155 615
3,181 (A)
2,566 (A’)
12:25:03
4
204=
136,447 560
160,000
3,071 (A)
2,511 (A’)
40:50:08
||ri||2 = ||Avi-iui ||22 + ||ATui-ivi||22
7/21/2015 11:18 AM
Berry and Stuart 2003
½
17
SVD Benchmarks
• Environment: Sun Enterprise 4500 SMP (14 400MHz
UltraSPARC nodes, 10GB RAM, 0.5TB Disk);
2-processor execution, 300 Lanczos iter., residual errors
||ri||2  10-8
Ngram
5
Peptides
205=
3,200,000
Proteins Factors Mult(A,A’)
832
103
507 (A)
404 (A’)
Time
(h:m:s)
03:45:31
Storage: U103 = 2.6GB, V103 = 684KB
Dataset: Completely sequenced vertebrate
mitochondrial genomes with 13 protein
coding genes; 64 vertebrates × 13 genes
7/21/2015 11:18 AM
Berry and Stuart 2003
18
SVD Benchmarks
• Environment: 32-node Linux/Dell cluster (2.4 GHz Dual
Pentium IV Zeon processors with 2GB RAM and 2.73 GB
Disk); 1-processor execution, 350 Lanczos iter., residual
errors ||ri||2  10-6
Ngram
5
Peptides
205=
3,200,000
Proteins Factors Mult(A,A’)
832
124
599 (A)
475 (A’)
Time
(h:m:s)
00:17:28
Storage: U124 = 3.0GB, V124 = 806KB
Dataset: Completely sequenced vertebrate
mitochondrial genomes with 13 protein
coding genes; 64 vertebrates × 13 genes
7/21/2015 11:18 AM
Berry and Stuart 2003
19
SVD Benchmarks
• Environment: 32-node Linux/Dell cluster (2.4 GHz Dual
Pentium IV Zeon processors with 2GB RAM and 2.73 GB
Disk); 1-processor execution, 1500 Lanczos iter.,
residual errors ||ri||2  10-6, A is 0.24% dense
Ngram
4
Peptides
Proteins Factors Mult(A,A’)
204=
160,000
206,240
463
2,427 (A)
1,964 (A’)
Time
(h:m:s)
8:15:01
Storage: A = 1.3GB, U462 = 565MB, V462 = 728MB
Dataset: 9-genome eukaryotic nuclear
gene collection
(10-07-03)
7/21/2015 11:18 AM
Berry and Stuart 2003
20
9 Nuclear Eukaryotic Genomes
Mmus (Mouse)
454
445
Rnor (Rat)
Hsap (Human)
454
Frub (Fish)
454
Dmel (Fly)
450
454
Agam (Mosquito)
Cele (Nematode)
Scer (Yeast Fungus)
Pfal (Protozoa)
10-463 dimensions
PHYLIP-Neighbor
PHYLIP-Consense
7/21/2015 11:18 AM
Stuart et al. (unpublished)
Berry and Stuart 2003
21
SVD History (Software)
• Early History (Part I)
• Serial Fortran for dense and complex matrices
G.H. Golub and P.A. Businger (CACM 12,
1969); G.H. Golub and C. Reinsch (Num. Mat.
12, 1970)
• Serial Fortran-77 Lanczos for symmetric
eigenvalue problem developed by Parlett
(Berkeley) and his students (circa 1988)
7/21/2015 11:18 AM
Berry and Stuart 2003
22
SVD History (Software)
• Early History (Part II)
• Sparse singular value decomposition version in
Fortran-77 written by Berry in 1990
• C version of SVD code written by T. Do (MS
‘92, UTK-CS) as LAS2 in SVDPACKC
• C++ version of SVD code written by H. Tang
(MS ’99, UTK-CS) within GTP
7/21/2015 11:18 AM
Berry and Stuart 2003
23
SVD History (Software)
• Recent History
• Parallel Fortran/MPI version for eigenvalue
problem written by K. Wu and H. Simon
(NERSC)
• Parallel SVD version for NOW (Fortran/MPI)
(D. Martin ’00)
• Parallel C++/MPI implementation of SVD
within GTP for NOW (D. Martin ’01)
• Java implementation of Lanczos SVD within
GTP (L. Wo ‘02)
7/21/2015 11:18 AM
Berry and Stuart 2003
24
3-gram versus 4-gram
• 4-grams rarer (lower probability) –
occurrence of tetrapeptide in 2 or 3
proteins is significant (function or
ancestry)
• 3-grams tend to be more noisy (occur
accidently); divergent proteins with 2,3
out of 4 amino acids matching may still
share tripeptides though.
• Some protein families could be missed
with 4-grams; experiments still needed.
7/21/2015 11:18 AM
Berry and Stuart 2003
25
Sparse QR Factorization
• Factor PA = WR and PAT= QS, where P reflects
column pivoting on the sparse matrices A and
AT.
• Wmk and Qnk define the k coordinates of each
peptide and protein, respectively.
• Can produce W and Q implicitly with no RAM
constraint; cosine similarities for protein
vectors can be compute in RAM; columns of W
must be written to disk for subsequent motif
analysis.
• Collaboration with Shakhina Pullatova (Tenn)
and P.W. Stewart (Maryland).
7/21/2015 11:18 AM
Berry and Stuart 2003
26
Nonnegative Matrix Factors
• Factor A ≈WH=∑kWkHk, columns of W
(Wk) define feature vectors (potential
motifs); columns of H are projections of
proteins on basis spanned by feature
vectors. W,H are nonnegative matrices.
• Regularization parameter () used to
balance error reduction with enforcement of smoothness (sparsity of H)
min {||Aj - WHj||22 +  ||Hj||22}
Hj
7/21/2015 11:18 AM
Berry and Stuart 2003
27
Nonnegative Matrix Factors
• Can interpret dimensions of vector
space (each Wk) as a motif and original
protein set as linear combinations of
motifs; sum of parts factorization.
• Collaboration with F. Shahnaz (Tenn),
V.P. Pauca (Wake Forest), and R.J.
Plemmons (Wake Forest).
7/21/2015 11:18 AM
Berry and Stuart 2003
28
Model Observations
• Novel prokaryotic relationships have been
suggested from individual and C-trees; may
be partly explained by recently discovered
horizontal gene transfer events.
• Copep motifs are novel in that they are not
derived from local alignments; thresholding
used for arbitrary presentation.
• Need to be able to expand singular subspace
dimensions to reveal C-tree convergence
properties - a computational challenge.
7/21/2015 11:18 AM
Berry and Stuart 2003
29
IBP (Internet Backplane Protocol)
• The Internet Backplane Protocol (IBP) is
middleware for managing and using remote
storage.
• It was invented to support logistical
networking in large scale, distributed systems
and applications; logistical networking refers
to the global scheduling and optimization of
data movement, storage and computation
based on a model that takes into account all
the network's underlying physical resources.
• IBP provides a mechanism for using
distributed storage for logistical purposes.
7/21/2015 11:18 AM
Berry and Stuart 2003
30
IBP (Internet Backplane Protocol)
Principal architects:
Micah Beck
Jim Plank
Logistical Computing
and Internetworking
(LoCI) Laboratory
UTK/CS
7/21/2015 11:18 AM
Berry and Stuart 2003
31
exNode Mobility
• XML-based data structure/serialization;
creates a portable soft-link
• Allows for replication, flexible decomposition of
data.
• Allows for error-correction
• Arbitrary metadata
XML Serialization
7/21/2015 11:18 AM
Berry and Stuart 2003
32
eXnode Data Sharing
eXnodes are pointers to
IBP allocations
Sample use: 2GB file of
136,553 protein vectors (each
containing 530 elements)
generated at UTK and uploaded
to IBP depots on UTK servers
1GB, 0.67GB, and 0.56GB
portions (slight overlaps)
successfully downloaded at ISU
using eXnode file sent via email
1GB benchmark: 46.6 sec
7/21/2015 11:18 AM
Berry and Stuart 2003
33
Collaborative Data Sharing
500-600 dim. singular subspace shared
between researchers (IBP-mail):
SMTP
sender
receiver
ex-Node
Store to
L-Bone
7/21/2015 11:18 AM
Download
from L-Bone
(can be partial)
Berry and Stuart 2003
34
Worldwide Text/Data Mining
7/21/2015 11:18 AM
Berry and Stuart 2003
35
What’s Ahead?
• Examine relative effectiveness of
different n-gram sizes (n=3,4,5); n=5
specifies 205 peptides; multiple gram
sizes in 1 dataset?
• Study effects of using different
frequency weights (binary, log-entropy,
TF-IPF) for matrix formulation.
• Better estimation of suitable factor
space (number of singular triplets)
needed.
7/21/2015 11:18 AM
Berry and Stuart 2003
36
What’s Ahead?
• Apply method to large, natural datasets
based on genomes of largely different
sizes and sequences using logistical
networking.
• Compare performance with alignmentbased methods on Clusters of
Orthologous Groups (COG’s) from
prokaryotes.
• Include wildcard matching in AA’s:
AxAA, AAxA.
7/21/2015 11:18 AM
Berry and Stuart 2003
37
A Few References …
IBP/eXnode: http://loci.cs.utk.edu
Vector Space Models:
Berry, Drmac, and Jessup (1999). Matrices, Vector Spaces,
and Information Retrieval. SIAM Review 41:335-362.
General Approach:
Stuart and Berry (2003). A Comprehensive Whole Genome
Bacterial Phylogeny Using Correlated Peptide Motifs Defined in
a High Dimensional Vector Space. J. of Bioinformatics and
Computational Biology 1(3):475-493.
Stuart, Moffett, and Leader (2002). Integrated gene and
species phylogenies from unaligned whole genome protein
sequences. Bioinformatics 18:100-108.
Stuart, Moffett, and Leader (2002). A comprehensive
vertebrate phylogeny using vector representations of protein
sequences from whole genomes. Mol. Biol. and Evol. 19:554562.
7/21/2015 11:18 AM
Berry and Stuart 2003
38
SIAM Data Mining 2004
Hyatt Orlando, Kissimmee, FL
April 22-24, 2004
http://www.siam.org/meetings/sdm04
7/21/2015 11:18 AM
Berry and Stuart 2003
39