Access the full text.
Sign up today, get an introductory month for just $19.
(1996)
Efficient Metropolis jumping rules. Pp. 599–608 in
B. Mau, M. Newton, B. Larget (1999)
Bayesian Phylogenetic Inference via Markov Chain Monte Carlo MethodsBiometrics, 55
R. Kass, A. Raftery (1995)
Bayes factors and model uncertainty
K. Hayasaka, T. Gojobori, S. Horai (1988)
Molecular phylogeny and evolution of primate mitochondrial DNA.Molecular biology and evolution, 5 6
K. McCabe, G. Khan, Y. Zhang, E. Mason, E. McCabe (1995)
Amplification of bacterial DNA using highly conserved sequences: automated analysis and potential for molecular triage of sepsis.Pediatrics, 95 2
L. Tierney (1994)
Markov Chains for Exploring Posterior DistributionsAnnals of Statistics, 22
D. Bhattacharya, L. Medlin (1995)
THE PHYLOGENY OF PLASTIDS: A REVIEW BASED ON COMPARISONS OF SMALL‐SUBUNIT RIBOSOMAL RNA CODING REGIONSJournal of Phycology, 31
MIKE HENDY, reviewing editor
S. Jeffery (1979)
Evolution of Protein Molecules
J. Felsenstein (1978)
The number of evolutionary treesSystematic Biology, 27
(1999)
The molecular population dynamics of HIV-1
C. Simon, L. Nigro, J. Sullivan, K. Holsinger, A. Martín, A. Grapputo, A. Franke, C. Mcintosh (1996)
Large differences in substitutional pattern and evolutionary rate of 12S ribosomal RNA genes.Molecular biology and evolution, 13 7
Kazuko Tamura, M. Nei (1993)
Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.Molecular biology and evolution, 10 3
B. Mau, M. Newton (1997)
Phylogenetic Inference for Binary Data on Dendograms Using Markov Chain Monte CarloJournal of Computational and Graphical Statistics, 6
B. Korber, B. Foley, T. Leitner (1997)
Human retroviruses and AIDS 1997
D. Schaid (1999)
Mathematical and Statistical Methods for Genetic AnalysisAmerican Journal of Human Genetics, 64
N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller (1953)
Equation of state calculations by fast computing machinesJournal of Chemical Physics, 21
M. Kuhner, Jon Yamato, J. Felsenstein (1995)
Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling.Genetics, 140 4
J. Thorne, H. Kishino, I. Painter (1998)
Estimating the rate of evolution of the rate of molecular evolution.Molecular biology and evolution, 15 12
Ziheng Yang (1995)
A space-time process model for the evolution of DNA sequences.Genetics, 139 2
W. Navidi, G. Churchill, A. Haeseler (1993)
Phylogenetic inference: linear invariants and maximum likelihood.Biometrics, 49 2
Karen Rudolph, Alan Parkinson, Carolyn Black, Andleonard Mayer (1993)
Evaluation of polymerase chain reaction for diagnosis of pneumococcal pneumoniaJournal of Clinical Microbiology, 31
(2001)
Accepted February
W. Schwartz (1982)
L. Margulis, Symbiosis in Cell Evolution. Life and its Environment on the Early Earth. XXII + 479 S., 104 Abb., 59 Tab. San Francisco 1981. W. H. Freeman and Co. £ 16.40 (Board), £ 9.20 (Paper)Journal of Basic Microbiology, 22
Adam Richman, Joshua Kohn (1999)
Self-incompatibility alleles from Physalis: implications for historical inference from balanced genetic polymorphisms.Proceedings of the National Academy of Sciences of the United States of America, 96 1
A. Rzhetsky, T. Sitnikova (1996)
When is it safe to use an oversimplified substitution model in tree-making?Molecular biology and evolution, 13 9
D. Loftsgaarden, C. Quesenberry (1965)
A nonparametric estimate of a multivariate density functionAnnals of Mathematical Statistics, 36
(1996)
Applications of molecular systematics: the state of the field and a look to the future. Pp
Vivek Nerurkar, Hien Nguyen, W. Dashwood, P. Hoffmann, Chaoquan Yin, D. Morens, Andrew Kaplan, R. Detels, Richard Yanagihara (1996)
HIV type 1 subtype E in commercial sex workers and injection drug users in southern Vietnam.AIDS research and human retroviruses, 12 9
I. Verdinelli, L. Wasserman (1995)
Computing Bayes Factors Using a Generalization of the Savage-Dickey Density RatioJournal of the American Statistical Association, 90
W. Feller (1959)
An Introduction to Probability Theory and Its Applications
J. Huelsenbeck, B. Larget, D. Swofford (2000)
A compound poisson process for relaxing the molecular clock.Genetics, 154 4
J. Sinsheimer, J. Lake, R. Little (1996)
Bayesian hypothesis testing of four-taxon topologies using molecular sequence data.Biometrics, 52 1
B. Larget (2000)
Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees
L. Margulis, D. Goode (1982)
Book-Review - Symbiosis in Cell Evolution - Life and its Environment on the Early Earth, 12
M. Kuhner, Jon Yamato, J. Felsenstein (1998)
Maximum likelihood estimation of population growth rates based on the coalescent.Genetics, 149 1
G. Myers, B. Korber, J. Berzofsky, G. Pavlakis, Rebecca Smith (1997)
Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences
R. Durbin, S. Eddy, A. Krogh, G. Mitchison (1998)
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
Molecular systematics. 2nd edition. Sinauer
D. Freedman, W. Feller (1967)
An Introduction to Probability Theory and Its ApplicationsEconometrica, 35
T. Leitner, David ESCANILLAt, Christer FRANZENt, M. Uhlén, J. Albert (1996)
Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis.Proceedings of the National Academy of Sciences of the United States of America, 93 20
Shuying Li, D. Pearl, Hani Doss (2000)
Phylogenetic Tree Construction Using Markov Chain Monte CarloJournal of the American Statistical Association, 95
W. Navidi, Gary Churchill, A. Haeseler (1991)
Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants.Molecular biology and evolution, 8 1
T. Jukes (1969)
CHAPTER 24 – Evolution of Protein Molecules
F. Ayala, E. Barrio, J. Kwiatowski (1996)
Molecular clock or erratic evolution? A tale of two genes.Proceedings of the National Academy of Sciences of the United States of America, 93 21
Ziheng Yang, B. Rannala (1997)
Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method.Molecular biology and evolution, 14 7
David Relman, Thomas Schmidt, Alvin Gajadhar, Mitchell Sogin, John Cross, Kristine Yoder, O. Sethabutr, Peter Echeverria (1996)
Molecular phylogenetic analysis of Cyclospora, the human intestinal pathogen, suggests that it is closely related to Eimeria species.The Journal of infectious diseases, 173 2
S. Whelan, N. Goldman (1999)
Distributions of statistics used for the comparison of models of sequence evolution in phylogeneticsMolecular Biology and Evolution, 16
(1998)
Theory of probability. Oxford classic texts in the physical sciences. 3rd edition
W. Hastings (1970)
Monte Carlo Sampling Methods Using Markov Chains and Their ApplicationsBiometrika, 57
J. Lake (1988)
Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequencesNature, 331
(1996)
Phylogenetic inferences
W. Gilks, S. Richardson, D. Spiegelhalter (1996)
Markov Chain Monte Carlo in practica
J. Huelsenbeck, B. Rannala (1997)
Phylogenetic methods come of age: testing hypotheses in an evolutionary context.Science, 276 5310
P. Green (1995)
Reversible jump Markov chain Monte Carlo computation and Bayesian model determinationBiometrika, 82
N. Caldararo (2001)
The evolution of HIVAmerican Journal of Human Biology, 13
Abstract We develop a reversible jump Markov chain Monte Carlo approach to estimating the posterior distribution of phylogenies based on aligned DNA/RNA sequences under several hierarchical evolutionary models. Using a proper, yet nontruncated and uninformative prior, we demonstrate the advantages of the Bayesian approach to hypothesis testing and estimation in phylogenetics by comparing different models for the infinitesimal rates of change among nucleotides, for the number of rate classes, and for the relationships among branch lengths. We compare the relative probabilities of these models and the appropriateness of a molecular clock using Bayes factors. Our most general model, first proposed by Tamura and Nei, parameterizes the infinitesimal change probabilities among nucleotides (A, G, C, T/U) into six parameters, consisting of three parameters for the nucleotide stationary distribution, two rate parameters for nucleotide transitions, and another parameter for nucleotide transversions. Nested models include the Hasegawa, Kishino, and Yano model with equal transition rates and the Kimura model with a uniform stationary distribution and equal transition rates. To illustrate our methods, we examine simulated data, 16S rRNA sequences from 15 contemporary eubacteria, halobacteria, eocytes, and eukaryotes, 9 primates, and the entire HIV genome of 11 isolates. We find that the Kimura model is too restrictive, that the Hasegawa, Kishino, and Yano model can be rejected for some data sets, that there is evidence for more than one rate class and a molecular clock among similar taxa, and that a molecular clock can be rejected for more distantly related taxa. Introduction Reconstruction of evolutionary relatedness among biological entities is a powerful tool in evolutionary biology and health care provision. For example, identification of bacterial pathogens and HIV strains by evolutionary relatedness may greatly increase the efficiency of therapeutic interventions (Rudolph et al. 1993 ; McCabe et. al. 1995 ; Nerurkar et al. 1996 ; Relman et al. 1996 ; Crandall 1999 ). Incorrect evolutionary models and reconstruction methods may lead to inconsistent results or may include unrealistic constraints on the process, sacrificing model accuracy in favor of computational ease and speed (Rzhetsky and Sitnikova 1996 ; Swofford et al. 1996 ; Durbin et al. 1998 ). Likelihood ratio tests for evolutionary models (for a review, see Huelsenbeck and Rannala 1997 ) can be remiss in that the topology space of evolutionary relatedness is discrete, data are sparse, parameter estimates may lie on the boundaries, and standard likelihood asymptotics may not apply (Navidi, Churchill, and von Haeseler 1991 , 1993; Goldman 1993 ; Sinsheimer, Lake, and Little 1996 ; Lange 1997 ; Whelan and Goldman 1999 ). Using Markov chain Monte Carlo (MCMC) methods (Gilks, Richardson, and Spiegelhalter 1996 ) to approximate posterior distributions allows us to broach evolutionary model selection in a computationally feasible manner, with topology determination as an application of reversible jump MCMC (Green 1995 ). Although MCMC methods have previously been used in the reconstruction of evolutionary relatedness (Kuhner, Yamato, and Felsenstein 1995, 1998 ; Rannala and Yang 1996 ; Mau and Newton 1997 ; Yang and Rannala 1997 ; Larget and Simon 1999 ; Mau, Newton, and Larget 1999 ; Li, Pearl, and Doss 2000 ), our methods differ from these in being fully Bayes with a proper, yet nontruncated and uninformative, prior in modeling assumptions, in likelihood computation, in proposal kernels, and in the range of hypotheses tested. The Bayesian hypothesis-testing approach we propose in this paper provides a framework to simultaneously infer evolutionary relationships and test a large set of modeling hypotheses, of which we illustrate only a few. In the Materials and Methods section, we describe the data upon which evolutionary relatedness is determined and models for reconstructing evolutionary trees, we introduce a reversible jump MCMC approach to estimate these relationships, and we show that Bayes factor comparison of evolutionary models is possible using vague but proper priors and can be used without conditioning on a particular topology. To illustrate, in the Results section, we compare several hierarchical evolutionary models, examine the appropriateness of a molecular clock, and test the existence of multiple rate classes. Materials and Methods Evolutionary Relationships and Models Data and Evolutionary Relationships We examined aligned deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sequences to determine the relatedness among N organisms. Letting i index the organism and j index the site along a given sequence, each position in the data Xij contained either a nucleotide base or an alignment gap. For simplicity, we first removed all insertion/deletion sites from these alignments to end up with ordered nucleotide sequences of length l, such that Xij ∈ (A, G, C, T/U) for all i = 1,2, …, N and j = 1,2, …, l. We assumed that nucleotide sites were independent and identically distributed (iid) within a set of sites evolving under the same evolutionary constraints (rate class r). Consequently, the likelihood of observing a given pattern X1jrX2jr m⃛ XNjr within r was multinomially distributed, where the probability was determined by an unknown bifurcating topology τ describing the evolutionary relatedness of the organisms, a set of branch lengths tb ∈ T for b = 1, 2, …, 2N − 3, and a Markovian model for evolutionary change along this topology (Sinsheimer, Lake, and Little 1996 ). The set T did not necessarily maintain consistent definition between different topologies. As a result, N-taxon topologies were nonnested models, each supported on separate parameter spaces T(τ). Models of Evolution A popular class of evolutionary models are continuous-time Markov chain models, parameterized in terms of a 4 × 4 infinitesimal rate matrix of nucleotide change Q and branch lengths that correspond to the expected number of changes between nodes per site. The matrix Q satisfies the condition Q1 = 0, leaving 12 nonnegative, off-diagonal parameters in the most general form of Q. The transition matrix \[\mathit{P}(\mathit{t})\ {=}\ \mathit{e^{tQ}}\ {=}\ {\{}\mathit{p}_{\mathit{s}_{0}\mathit{s}_{1}}(\mathit{t}){\}}\ (1)\] defines the transition probabilities from state s0 to state s1 where s0, s1 ∈ (A, G, C, T/U) in time t. The data allow only for the estimation of the product tQ, so without loss of generality, we constrain Trace(Q) = −1. We explore three nested evolutionary models which reduce the parameterization of Q. The most general, TN93 (Tamura and Nei 1993 ), allows for differing evolutionary rates between purine-to-purine transitions (α), pyrimidine-to-pyrimidine transitions (γ), and transversions (purine-to-pyrimidine or pyrimidine-to-purine; β) and allows the general stationary distribution of the nucleotides (π) to vary subject to the constraints πm ≥ 0, Σ πm = 1 for m ∈ (A, G, C, T/U) and detailed balance, Qπ = π. The resulting infinitesimal rate matrix is where the minus sign in each row represents minus the sum of the remaining elements in that row. Letting Trace(Q) = −1 leads to β = [1 − α(πA + πG) − γ(πC + πT)]/2 and (α, γ) ∈ [0, 1) × [0, 1). TN93 is a generalization of the HKY85 model (Hasegawa, Kishino, and Yano 1985 ), where α = γ, and of the K80 model (Kimura 1980 ), where πm = 1/4 and α = γ. Previous evolutionary reconstructions from nucleic acid data using some MCMC methods fix the stationary distribution at either an empirical estimate from the observed data (Li, Pearl, and Doss 2000) or at values determined by preliminary MCMC sampling (Mau, Newton, and Larget 1999 ). Like Larget and Simon (1999) , we have not adopted either of these approaches. Empirical estimates give equal weight to all taxa and may therefore be biased when taxon selection oversamples certain subgroups, while fixing parameters can lead to underestimation of the variance of other parameters. Instead, our MCMC approach samples all model parameters. For all three models, the position of the root, the most recent common ancestor (MRCA) of all N taxa, is not estimable without further parameter restrictions (Felsenstein 1981 ), such as a molecular clock. If we can identify an outgroup taxon on the same branch as the root, a molecular clock among the remaining N − 1 taxa is a nested submodel in our framework and thus can be tested. A molecular clock allows for a computationally advantageous parameterization (Mau, Newton, and Larget 1999 ) and reduces by half the number of branch lengths to be estimated. We extend the HKY85 parameterization to a mixture model containing R infinitesimal rate matrices Qr and R sets of branch lengths Tr, where r = 1, …, R, R is the number of different site classes present in the data, and each site in the data is assigned a priori to belong to class r. We choose HKY85 as an example for comparison with previous work (Yang 1995 ; Larget and Simon 1999 ) and note that such mixture models are easily implemented for TN93 or K80 as well. This mixture model is applicable when the reading frame of the DNA sequence is known and rate assignments are based on codon position in the reading frame or when data from different genes known to evolve under different selective pressures are combined. The model is a generalization of the Bayesian computation of Larget and Simon (1999) , in which they estimate multiple Qr matrices but not multiple branch lengths, and of the work of Yang (1995) , where he assumes that the branch lengths between different classes are scalar multiples. Multiple Qr matrices allow for different transition/transversion ratios and stationary distributions across classes, and multiple Tr sets allow for varying rates of evolution both across classes and between species. Yang's (1995) scalar multiple branch lengths assume that the relative rates of evolution between classes are constant across species. Bayesian Computation Priors Priors must remain proper to estimate Bayes factors. We employ flat or vague but completely proper priors over the entire parameter space (τ, θ(τ)), where θ(τ) = (π, α, γ, T, μ) and μ is a hyperparameter to help define the prior for tb in T. When we employ multiple site classes, α = (α1, …, αR), π = (π1, …, πR), T = (T1, …, TR), and μ= (μ1, …, μR). We assume a prior for all parameters that is independent of topology such that q(θ(τ)|τ) = q(θ) and that all components of θ are, a priori, independent. For the TN93 model, we set Except for branch lengths, these prior probabilities are uninformative. The uniform distribution on τ is over the discrete space of (2N − 5)!/2N−3(N − 3)! possible topologies for N taxa (Felsenstein 1978 ). Branch lengths are iid given μ. We take μ to have expectation 1 and variance 10, as, a priori, we know little about its tendencies. The prior for tb, supported on [0, ∞), is vague but integrable. The usual Jeffreys' prior on tb ∈ [0, ∞) is 1/tb (Jeffreys 1998 ). This prior is not integrable and precludes testing a molecular clock. The inverse-gamma density allows computation of the reciprocal moments of μ, which are also needed to test for a molecular clock (see appendix B). Computation We employ a Metropolis-within-Gibbs (Tierney 1994 ) sampler using reversible model jumping (Green 1995 ) among topologies. The parameter space dimension remains constant across topology models, although interpretation of portions of T is model-dependent. Each new state of the Markov chain is proposed via a Gibbs cycle. In each step within the cycle, a single parameter block is updated conditional on the remaining blocks using a Metropolis-Hastings algorithm (Metropolis et al. 1953 ; Hastings 1970 ). We use the following update cycle: Previous MCMC approaches have updated τ and T simultaneously (Larget and Simon 1999 ; Mau, Newton, and Larget 1999 ); however, these approaches consider, at most, proportional rates of evolution between rate classes. Here, we include an extra T-only block to improve mixing within different sets of branch lengths for each class. We give our transition kernels for each Metropolis-Hastings step in appendix A. Where possible, we employ transition kernels that are symmetric to decrease computational complexity and are supported on the same bounded or discrete space as the kernel's underlying parameters to increase acceptance probabilities. Each run of our MCMC chain consists of 500,000 full update cycles, and we disregard the first 100,000 steps as burn-in. For the starting state, we draw τ, μ, T|μ, α, and γ directly from the prior distributions and set π equal to observed nucleotide frequencies in each class. We estimate functions of the chain's posterior by subsampling every 40 steps after burn-in. Multiple chains are run to insure adequate convergence. We use D = Σtb, representing the total divergence between all taxa, μ, α, γ, and π, to assess convergence within and across topologies. These parameters retain their interpretation as the sampler moves between topologies and may be used effectively to monitor how well the MCMC sampler is performing (Brooks and Guidici 1999 ). We calculate the likelihood of the data given τ, T, α, γ, and π by integrating out the unknown states of the internal nodes using the pruning algorithm of Felsenstein (1981) . Model Comparisons We make comparisons among models using Bayes factors (Kass and Raftery 1995 ). The Bayes factor in favor of model M1 against model M0, given the data Y, can be expressed as where f(Y|Mi) = ∫ f(Y|θi, Mi)q(θi) dθi, θi are the parameters under model Mi, f(·) are sampling densities, and q(·) are priors. Different densities are distinguished by their arguments in a common abuse of notation. If model M0 is nested within another model M1 such that the parameter space of M1 is θ1 = (ω, ϕ) and the parameter space of M0 is θ0 = (ω0, ϕ) where ω0 is a known constant with q0(ϕ) ∝ q1(ω = ω0, ϕ), then B10 may be estimated via posterior simulation of M1 using the Savage-Dickey ratio (Verdinelli and Wasserman 1995 ), where q(ω = ω0|M1) is the prior and p(ω = ω0|Y, M1) is the posterior of ω, both evaluated at ω0. There exist several methods for estimating the posterior density of ω from an MCMC simulation, including nonparametric kernel density estimation methods and multivariate normal approximations. The priors induced by restrictions of the infinitesimal rate matrices and by restrictions of the number of classes are derivable. The restrictions and induced priors are presented in the next two sections. In contrast, the derivation of the priors induced by the molecular-clock restrictions is more involved. Analytical results for some situations are presented in appendix B, and a general numerical approximation is presented in the Induced Priors on Coalescent Height Differences section, below. Restrictions on Evolutionary Rates The HKY85 model is a restriction of TN93, and the K80 model is a restriction of HKY85. To test the appropriateness of the restricted models, we generate a posterior sample of the joint (α, γ, π) using our MCMC sampler under our most general TN93 model. We then estimate the Bayes factors in favor of TN93 against HKY85 and in favor of TN93 against K80. We then generate a posterior sample of the joint (α, π) under HKY85 to estimate the Bayes factor in favor of HKY85 against K80. We approximate the posterior densities of (α, γ, π) and (α, π) using a normal approximation with the estimated posterior mean and posterior covariance evaluated at the joint restriction α = γ and at (πm = 1/4, α = γ) (in the former case) and at πm = 1/4 (in the latter case). We directly calculate the appropriate prior densities at these restrictions. When testing α = γ, we recall that this restriction is equivalent to α − γ = 0 and that the difference of two Uniform[0, 1) random variables is triangularly distributed on [−1, 1] with a density of 1 at the restriction (Feller 1971 ). We then form the respective Bayes factors using equation (6) . Multiple Classes In data sets putatively containing multiple site classes, we estimate the Bayes factors in favor of multiple Qr matrices under HKY85 by first generating a posterior sample of (α1, …, αR, π1, …, πR) using our MCMC sampler. We approximate the posterior density at the restriction (α1 = … = αR, π1 = … = πR) using a normal approximation based on the posterior mean and posterior covariance of the sample reparameterized as where the elements of πr are πm,r, r ∈ (1, …, R), m ∈ (A, G, C, U/T), evaluated at ψ = (0, …, 0). We form the Bayes factor by dividing this posterior density estimate by the prior evaluated at the joint restriction. The induced prior equals q(α1 − αR, …, αR−1 − αR) × q(π1 − πR, …, πR−1 − πR) by prior independence. We identify that where U and W are two independent, multidimensional, random variables. We evaluate q(U − W) at U − W =0 using the convolution integral for the difference of two random variables. This results in \[\mathit{q}({\alpha}_{1}\ {-}\ {\alpha}_{\mathit{R}}\ {=}\ 0,\ {\ldots}{\,},\ {\alpha}_{\mathit{R}{-}1}\ {-}\ {\alpha}_{\mathit{R}}\ {=}\ 0)\ {=}\ 1\ (9)\] given our prior on α. This calculation does not depend on the number of classes R. Following a similar derivation, \[\mathit{q}({\pi}_{1}\ {-}\ {\pi}_{\mathit{R}}\ {=}\ 0,\ {\ldots}{\,},\ {\pi}_{\mathit{R}{-}1}\ {-}\ {\pi}_{\mathit{R}}\ {=}\ 0)\ {=}\ 6^{\mathit{R}{-}1}.\ (10)\] Molecular-Clock Restrictions To test the appropriateness of a molecular clock, we condition on the posterior mode topology, identify a known outgroup, and reparameterize the branch lengths in terms of coalescent height differences, Δij. These parameters measure the difference in the sums of the branch lengths between two contemporary taxa i and j and their MRCA. Under a molecular clock, Δij = 0. Given the outgroup R in the topology illustrated in figure 1 , a molecular clock constrains Each constraint may be considered marginally as a diagnostic to identify portions of the topology which violate or support a molecular clock, or all constraints may be examined jointly. The ability to simultaneously consider each constraint marginally is an advantage of our framework over previous molecular-clock tests and allows for testing of a local molecular clock (Hillis, Mable, and Moritz 1996 ; Huelsenbeck, Larget, and Swofford 2000 ) within a subset of taxa. Taken jointly, half of the constraints in equation (11) are redundant, so we can reduce the joint restriction dimensionality by employing a conditioning argument to show that It is straightforward to extend the conditioning argument to larger trees. Induced Priors on Coalescent Height Differences For two- or three-taxon rooted topologies and for any arbitrary pair of taxa considered marginally in an N-taxon topology given an outgroup, we derive exact expressions for the ordinate of the induced prior q(Δij = 0) in appendix B. To consider constraints jointly for N > 3, we estimate η̂, the ordinate of the induced joint prior q(Δij = 0), via simulation. We draw n = 50,000 samples of μ and T|μ directly from our priors and form the appropriate Δij from T. We calculate η̂ using the multidimensional density estimator where Δ(k)ij is the kth sample, d is the dimension of Δij, w is the radius of a hypersphere in ℜd, ωd = πd/2wd/Γ[(d/2) + 1] is its volume, and ‖·‖ is the Euclidean norm. For each simulation, we fix w at its smallest value such that the hypersphere contains at least \(\sqrt{\mathit{n}}\) of the simulation sample (Loftsgaarden and Quesenberry 1965 ). If the true density q(Δij) is locally linear in the neighborhood of 0, then this method is unbiased. For N = 2 taxa, q(Δ) does not satisfy the locally linear condition, as the mode of Δ is 0 (see appendix B). For N ≥ 3, the mode is no longer centered at 0, and the approximation's bias decreases. Conditioning on ωd and recalling that η̂ is the sum of independent Bernoulli random variables, the finite sample variance is approximated by As a diagnostic for these simulations and the estimator, we compare η̂ with the analytic results developed in appendix B for n = 2 and 3. For n = 2, we calculate η̂ = 1.0 ± 0.1 (estimate ± SD), and the exact result equals 0.955. For n = 3, η̂ = 0.87 ± 0.02, while the exact result equals 0.897. To evaluate the estimator in higher dimensions, we drew 50,000 samples from a 12-dimensional multivariate N(1, I), where 1 = (1, …, 1)t and I is the identity matrix, and obtained η̂ = 3.5 × 10−8 ± 0.5 × 10−8, while the theoretical density is 4.0 × 10−8. These results return the theoretical densities to within the same order of magnitude and show only small simulation error or bias. Results To make our inference methods more concrete, we examined four data sets: (1) simulated data, (2) representative organisms from across all living kingdoms (Tree of Life [TOL]), (3) primates, and (4) different HIV isolates. Each of these data sets illustrates different aspects of Bayesian inference. The simulated data demonstrated that a molecular clock will be accepted when it is actually present. The primate data were used to test for multiple rates and to test restrictions on the infinitesimal rate matrix without conditioning on topology. The TOL data, the primates, and the HIV isolates demonstrated the versatility of the Bayesian method in testing the molecular-clock hypothesis. The TOL data also demonstrated that our MCMC implementation is practical for as many as 15 taxa. Simulated Data To insure that our methods would support a molecular clock if one were present, we simulated sequences of length 1,500 under a molecular clock for four contemporary taxa (A, B, C, D) and an outgroup (R) using the topology in figure 1 . We imposed a molecular clock by assigning branch lengths so that the evolutionary distances from MRCAs and contemporary taxa were equal. The approximate posterior density of Δij = 0 was 1,106.3. A log10B10 value of 0 implies that both models are equally likely, while values greater than 2 represent very strong evidence in support of the general model and values less than −2 represent very strong evidence in support of the restricted model (Kass and Raftery 1995 ). The induced prior for Δij in this topology was 0.52, yielding a log10B10 value of −3.32, which favors a molecular clock. Tree of Life The TOL data set consisted of 15 16S ribosomal RNA sequences (rRNA) (Lake 1988 ). There were 1,039 aligned nucleotides after removal of gaps, and πobs = (0.2408, 0.3157, 0.2464, 0.1971)t. The species were drawn from four major classes of living organisms: eukaryotes, eubacteria, halobacteria, and eocytes, and also included the chloroplastic sequence from a eukaryote, Zea mays (chl.). Figure 2 (left) shows the modal topology (86% ± 3%, posterior probability mean ± SD determined from 10 independent chains) and the conditional posterior mean branch lengths estimated under the TN93 model. The model correctly clustered the eukaryotes, eocytes, halobacteria, and eubacteria into their appropriate monophyletic groups (clades) based on organism morphology and clustered the chloroplastic sequence in the eubacterial clade. This result is consistent with the endosymbiotic hypothesis of the origins of eukaryotic cellular organelles (Margulis 1981 ) and has been demonstrated previously using rRNA (Lake 1988 ; Bhattacharya and Medlin 1995 ). Table 1 lists the marginal posterior means and standard errors of α, γ, π, μ, and D under TN93, HKY85, and K80. Primates The primate data comprised a portion of the mitochondrial DNA from a human, a chimpanzee, a gorilla, an orangutan, a gibbon, a macaque, a squirrel monkey, a tarsier, and a lemur (Brown et al. 1982 ; Hayasaka, Gojobori, and Horai 1988 ) and had previously been analyzed using MCMC methodology (Yang and Rannala 1997 ; Larget and Simon 1999 ). There were 888 sites after removal of alignment gaps, and πobs = (0.3219, 0.1076, 0.3044, 0.2660)t. Figure 3 illustrates the two dominant topologies seen in the posterior of all models and the conditional posterior mean branch lengths under the TN93 model. In both topologies, the sampler properly clusters the apes, monkeys, and prosimians. Table 1 gives evolutionary parameters and divergence estimates and their standard errors. The posterior distribution of topologies was model-dependent, with the relationship between humans, chimps, and gorillas varying. Under TN93, humans and chimps were topographically the most closely related among the three species (90% ± 3%). Similarly, under HKY85, the posterior mean was 84% ± 2% using one rate class and 92% ± 3% using four rate classes. Under K80, the posterior mean was 90% ± 3%. However, under an even more restrictive model proposed by Jukes and Cantor (1969) (JC69) in which α = γ = β and πm = 1/4, we found that chimps and gorillas were topographically the most closely related (88% ± 5%). Unconditional on topology, the distance (expected number of changes per site) between humans and chimps was 0.41 ± 0.05 (posterior mean ± posterior SD) and the distance between chimps and gorillas was 0.52 ± 0.05 under TN93. Under JC69, these distances were 0.40 ± 0.05 and 0.45 ± 0.05, respectively. HIV The HIV data contained the complete HIV genomes of two subtype D isolates, eight subtype B isolates, and one ADI subtype recombinant, MAL (Korber et al. 1997 ). The subtype B isolates JRCSF and JRFL were collected from the same patient. There were 7,969 sites after removing all gaps in the aligned genomes, and πobs = (0.3698, 0.2365, 0.1708, 0.2229)t. Figure 4 displays the two topologies that account for virtually 100% of the posterior. These topologies were drawn with their conditional posterior mean branch lengths under TN93. One internal branch within the subtype B clade was approximately zero. At zero, the two shown topologies become equivalent. The sampler placed JRFL and JRCSF as nearest neighbors and correctly clustered the D and B subtypes. Table 1 gives the estimated evolutionary parameters and divergence. TN93, HKY85, and K80 Comparison The log10 Bayes factors for all examples and models are given in table 2 . TN93 was strongly supported by the TOL and HIV examples when comparing restrictions of α, γ, and π; all log10B10 values were ≥3. Support for TN93 over HKY85 was less conclusive when restrictions of α and γ were compared for the primate example, for which the log10B10 value was 0.3. HKY85 was strongly supported over K80 when restrictions of π were compared. Multiple Classes in the Primates The mitochondrial sequences in the primate data set comprised the coding region for two individual protein subunits with known reading frames and a transfer RNA (tRNA) portion (Brown et al. 1982 ; Hayasaka, Gojobori, and Horai 1988 ). Following Yang (1995) , we divided the data into four classes, one for the tRNA (194 nt in length) and the remaining three for the first, second, and third codon positions in the protein subunits with lengths 232, 231, and 231 nt. In table 3 , we report the posterior means and the posterior standard deviations of the parameters α π, μ, and D for each class under HKY85. The total divergence D serves as a surrogate for reporting all branch lengths T. The posterior of the ratio Di/Dj estimates the relative rates of evolution between classes i and j. Between the first and second codon positions, we found a posterior mean ratio of 0.49 (0.06 SD), between the first and third we found a posterior mean ratio of 10.2 (2.2), and between the first position and tRNA we found a posterior mean ratio of 0.62 (0.07). These estimates are comparable to those determined by Yang (1995) and include measures of uncertainty. An approximate 10-fold increase in mutation was observed in the third codon position compared to the first, consistent with the increased redundancy of the genetic code in the third position. Furthermore, the evolutionary rate parameters α and stationary distributions π between classes were quite disparate. The log10 Bayes factor in favor of multiple α and π across all four classes was 43.7. Testing a Molecular Clock We examined the appropriateness of a molecular clock conditional on the modal topology under TN93. We chose the eocyte clade as the outgroup for TOL, as using this clade offered the least support against a molecular clock. The ADI recombinant was assumed to be the outgroup for the HIV example. For the primate example, the lemurs represented the outgroup. In table 4 , we list the posterior means and standard deviations of minimum sufficient sets of molecular-clock constraints for these examples. There were 9 constraints for HIV, 7 for the primates, and 11 for the TOL mode topologies. We further give the number of nodes traversed between the taxa and their MRCA, each Δij's marginal prior density evaluated at 0 as determined by these numbers of nodes (appendix B), and log10B10 against a molecular clock for each two-taxon comparison. Examined univariately, all Δij's supported a molecular clock within the B subtype clade for HIV, with the exception of the JRCSF-JRFL constraint. Between clades, a molecular clock was strongly rejected (SF2-ELI constraint, log10B10 = 10.8). Within the primates, a molecular clock was weakly supported by each constraint within the anthropoids (apes and monkeys) (all log10B10 ≤ −0.5) but rejected between the anthropoids and prosimians (log10B10 = 1.0). Also in table 4 , we calculate the joint posterior using a multivariate normal approximation and prior densities via simulation evaluated at 0, and by taking the ratio of these values, we report the joint log10B10 for each data set. The TOL and HIV examples offered strong support against a molecular clock (log10B10 = 31.8 and 12.3, respectively), while the primates offer weaker support against a molecular clock (log10B10 = 1.3). We examined three subsets of our examples identified as interesting by the marginal diagnostics: (1) the eight subtype B isolates, (2) the anthropoids, and (3) the eukaryotes. Table 4 displays the corresponding joint log10 posterior and prior densities and log10B10 for these subsets. As an illustration, figure 2 (right) plots the marginal posterior and prior distributions of the three coalescent height differences among the eukaryotes. The eukaryotes continued to offer strong support against a molecular clock (log10B10 = 14.0), while the much more closely related B subtype isolates and anthropoids offered strong support in favor of a local molecular clock (log10B10 = −3.7 and −2.4, respectively). Discussion We propose a reversible jump MCMC algorithm for sampling from the posterior distribution of topologies and other parameters used to model the relatedness among organisms. Individual topologies are separate statistical models. While evolutionary parameters retain definition across these models, some branch lengths do not. For a fixed number of organisms, the dimension of the parameter space spanned by the branch lengths within a topology model remains constant, making reversible model jumps convenient. In allowing the sampler to explore the posterior across topology models, we overcome a shortfall of traditional analysis used to compare different continuous-time Markov evolutionary models. One can use a likelihood ratio test by maximizing the likelihood of the general model and the likelihood of the restricted model conditional on the same topology; however, the topologies that maximize the likelihood may differ under the two models. Then, general and restricted evolutionary models are no longer nested, and formal inference under a likelihood ratio test is no longer possible. In effect, our reversible jump MCMC sampler integrates out the nonnested portions of the parameter space. The Bayesian approach also allows us to effectively incorporate the uncertainty in the topology into the variance of the parameter estimates. Frequentist inference is forced to condition on topology and therefore underestimates the uncertainty. The TOL example offers strong evidence against the universal appropriateness of a molecular clock; however, the anthropoids and subtype B isolates demonstrate that a local molecular clock for closely related taxa is a reasonable model. This finding is quite insensitive to prior choice. A molecular clock was originally employed in MCMC methods for evolutionary reconstruction to reduce computation (e.g., Mau and Newton 1997 ; Yang and Rannala 1997 ; Mau, Newton, and Larget 1999 ; Li, Pearl, and Doss 2000 ), but numerous examples of restriction violations exist (Ayala, Barrio, and Kwiatowski 1996 ; Leitner et al. 1996 ; Hillis, Mable, and Moritz 1996 ; Simon et al. 1996 ; Holmes, Pybus, and Harvey 1999 ; Richman and Kohn 1999 ). Larget and Simon (1999) show that eliminating the molecular clock by doubling the number of estimable branch lengths does not produce an intractable problem; we extend the computation to allow for multiple sets of branch lengths that are not constrained by a molecular clock. In doing so, we further the frameworks of Thorne, Kishino, and Painter (1998) and Huelsenbeck, Larget, and Swofford (2000) in several important ways. Thorne, Kishino, and Painter (1998) introduce a Bayesian approach that does not impose a molecular clock by first assuming that the true relationship of the taxa under study is known with complete certainty and employing empirical Bayes priors that use the data twice. However, they do not provide a statistical test of the appropriateness of a molecular clock. Huelsenbeck, Larget, and Swofford (2000) continue to assume that the true relationship is known and formulate a likelihood ratio test that may become difficult to interpret when the data are sparse. We overcome both shortfalls by providing a framework to statistically test the appropriateness of a molecular clock while not having to condition on a known a priori topology and simultaneously make inference about the appropriate parameterization of the infinitesimal rate matrix. Additionally, pairwise diagnostic Bayes factors we propose allow the researcher to conveniently identify portions of an evolutionary history that violate a molecular clock and portions that support a local molecular clock. On the other hand, Huelsenbeck, Larget, and Swofford (2000) allow for the estimation of divergence times, while our approach does not eliminate the confounding of time and evolutionary rate. To provide both large and small jumps among so many branch lengths, we choose a 50/50 mixture of two transition kernels—the first updating all branch lengths simultaneously using a reflective normal driver with small variance, and the second randomly selecting and updating one branch length using a driver with large variance. This mixture removed initially poor convergence in the HIV data set that had small branch lengths as compared with the other two examples. We find quick convergence and sufficient mixing for up to at least 15-taxon topologies without a molecular clock. Mike Hendy, Reviewing Editor 1 Keywords: phylogenetics Markov chain Monte Carlo nested hypothesis testing Bayes factors 2 Address for correspondence and reprints: Janet S. Sinsheimer, Department of Human Genetics, UCLA School of Medicine, Los Angeles, California 90095-7088. janet@sunlab.ph.ucla.edu. Table 1 Parameter Estimates for the Tree of Life (TOL), Primates, and HIV Under the TN93, HKY85, and K80 Models View Large Table 2 Log10 Bayes Factors in Favor of the More General Evolutionary Model Against a Nested Model for the Tree of Life (TOL), Primates, and HIV View Large Table 3 Parameter Estimates When Fitting Four Site Classes Using the Primate Example Under the HKY85 Model View Large Table 4 Molecular-Clock Estimates for the Tree of Life (TOL), Primates, and HIV View Large View largeDownload slide Fig. 1.—Topology used for simulation under a molecular clock. Taxon R is assigned as the outgroup (having diverged before the remaining taxa) to allow rooting at an arbitrary position along taxon R's branch View largeDownload slide Fig. 1.—Topology used for simulation under a molecular clock. Taxon R is assigned as the outgroup (having diverged before the remaining taxa) to allow rooting at an arbitrary position along taxon R's branch View largeDownload slide Fig. 2.—The Tree of Life modal (87% ± 3%) topology under TN93 (left). Branch lengths are drawn to scale. Plots of the marginal posterior (solid line), normal approximation to the posterior (dashed), and prior (dotted) for the three molecular-clock height differences (Δij) within the Eukaryote clade are shown (right). The data do not support a molecular-clock restriction, as the posterior densities are less than the prior density at Δij = 0 View largeDownload slide Fig. 2.—The Tree of Life modal (87% ± 3%) topology under TN93 (left). Branch lengths are drawn to scale. Plots of the marginal posterior (solid line), normal approximation to the posterior (dashed), and prior (dotted) for the three molecular-clock height differences (Δij) within the Eukaryote clade are shown (right). The data do not support a molecular-clock restriction, as the posterior densities are less than the prior density at Δij = 0 View largeDownload slide Fig. 3.—The two dominant topologies for primates under TN93 using one rate class. The complete displayed topology has a posterior probability of 90% ± 3%, while the alternate clade accounts for the remaining 10%. Branch lengths are drawn to scale View largeDownload slide Fig. 3.—The two dominant topologies for primates under TN93 using one rate class. The complete displayed topology has a posterior probability of 90% ± 3%, while the alternate clade accounts for the remaining 10%. Branch lengths are drawn to scale View largeDownload slide Fig. 4.—Two topologies account for 100% of the posterior for HIV under TN93. Branch lengths are drawn to scale. The two topologies converge as the circled internal branches approach zero View largeDownload slide Fig. 4.—Two topologies account for 100% of the posterior for HIV under TN93. Branch lengths are drawn to scale. The two topologies converge as the circled internal branches approach zero View largeDownload slide Fig. 5.—Local and global pendant leaf algorithms. View largeDownload slide Fig. 5.—Local and global pendant leaf algorithms. We thank James Lake for supplying the aligned sequences used in the TOL example and Karin Dorman and John Boscardin for their helpful criticism. M.A.S. was supported by a predoctoral fellowship from the Howard Hughes Medical Institute. J.S.S. was partially supported by USPHS grants AI28697 and CA16042. literature cited Ayala, F. J., E. Barrio, J. Kwiatowski. 1996. Molecular clock or erratic evolution?. A tale of two genes. Proc. Natl. Acad. Sci. USA. 93: 11729–11734 Google Scholar Bhattacharya, D., L. Medlin. 1995. The phylogeny of plastids: a review based on comparisons of small-subunit ribosomal RNA coding regions. J. Phycol. 31: 489–498 Google Scholar Brooks, S. P., P. Guidici. 1999. Convergence assessment for reversible jump MCMC simulationsPp. 733–742 in J. Bernardo, J. Berger, A. P. Dawid, and A. F. M. Smith, eds. Bayesian Statistics 6. Oxford University Press, Cambridge, Mass Google Scholar Brown, W. M., E. M. Prager, A. Wang, A. C. Wilson. 1982. Mitochondrial DNA sequences of primates, tempo and mode of evolution. J. Mol. Evol. 18: 225–239 Google Scholar Crandall, K. A. ed. 1999. The evolution of HIVJohns Hopkins University Press, Baltimore, Md Google Scholar Durbin, R., S. Eddy, A. Krogh, G. Mitchinson. 1998. Biological sequence analysis: probabilistic models of proteins and nucleic acidsCambridge University Press, Cambridge, England Google Scholar Feller, W.. 1971. An introduction to probability theory and its applicationsVol. 2, 2nd edition. John Wiley and Sons, New York Google Scholar Felsenstein, J.. 1978. The number of evolutionary trees. Syst. Zool. 27: 27–33 Google Scholar ———.1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368–376 Google Scholar Gelman, A., G. O. Roberts, W. R. Gilks. 1996. Efficient Metropolis jumping rulesPp. 599–608 in J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, eds. Bayesian Statistics 5. Oxford University Press, Oxford, England Google Scholar Gilks, W. R., S. Richardson, D. J. Spiegelhalter. 1996. Markov chain Monte CarloChapman and Hall, New York Google Scholar Goldman, N.. 1993. Statistical tests of models of DNA substitution. J. Mol. Evol. 36: 182–198 Google Scholar Green, P. J.. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 82: 711–732 Google Scholar Hasegawa, M., H. Kishino, T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22: 160–174 Google Scholar Hastings, W. K.. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 57: 97–109 Google Scholar Hayasaka, K., K. T. Gojobori, S. Horai. 1988. Molecular phylogeny and evolution of primate mitochondrial DNA. Mol. Biol. Evol. 5: 626–644 Google Scholar Hillis, D. M., B. K. Mable, C. Moritz. 1996. Applications of molecular systematics: the state of the field and a look to the futurePp. 515–543 in D. M. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. 2nd edition. Sinauer, Sunderland, Mass Google Scholar Holmes, E. C., O. G. Pybus, P. H. Harvey. 1999. The molecular population dynamics of HIV-1Pp. 177–207 in K. A. Crandall, ed. The evolution of HIV. Johns Hopkins University Press, Baltimore, Md Google Scholar Huelsenbeck, J. P., B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science. 276: 227–232 Google Scholar Huelsenbeck, J. P., B. Larget, D. Swofford. 2000. A compound Poisson process for relaxing the molecular clock. Genetics. 154: 1879–1892 Google Scholar Jeffreys, H.. 1998. Theory of probabilityOxford classic texts in the physical sciences. 3rd edition. Oxford University Press, New York Google Scholar Jukes, T., C. Cantor. 1969. Evolution of protein moleculesPp. 21–132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York Google Scholar Kass, R. E., A. E. Raftery. 1995. Bayes factors and model uncertainty. J. Am. Stat. Assoc. 90: 773–795 Google Scholar Kimura, M.. 1980. A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16: 111–120 Google Scholar Korber, B., B. Hahn, B. Foley, J. W. Mellors, T. Leitner, G. Myers, F. McCutchan, C. L. Kuikeneds1997. Human retroviruses and AIDS 1997: a compilation and analysis of nucleic acid and amino acid sequencesTheoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM. (http://hiv-web.lanl.gov) Google Scholar Kuhner, M., J. Yamato, J. Felsenstein. 1995. Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics. 140: 1421–1430 Google Scholar ———.1998. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics. 149:429–434 Google Scholar Lake, J. A.. 1988. Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature. 331: 184–186 Google Scholar Lange, K.. 1997. Mathematical and statistical methods for genetic analysisSpringer, New York Google Scholar Larget, B., D. L. Simon. 1999. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16: 750–759 Google Scholar Leitner, T., D. Escanilla, C. Franzn, M. Uhln, J. Albert. 1996. Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc. Natl. Acad. Sci. USA. 93: 10864–10869 Google Scholar Li, S., D. K. Pearl, H. Doss. 2000. Phylogenetic tree construction using Markov chain Monte Carlo. J. Am. Stat. Assoc. 95: 493–508 Google Scholar Loftsgaarden, D. O., C. P. Quesenberry. 1965. A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36: 1049–1051 Google Scholar McCabe, K. M., G. Khan, Y. H. Zhang, E. O. Mason, E. R. McCabe. 1995. Amplification of bacterial DNA using highly conserved sequences: automated analysis and potential for molecular triage of sepsis. Pediatrics. 95: 165–169 Google Scholar Margulis, L.. 1981. Symbiosis in cell evolution: life and its environment on the early earthW. H. Freeman, San Francisco Google Scholar Mau, B., M. A. Newton. 1997. Phylogenetic inference for binary data on dendograms using Markov chain Monte Carlo. J. Comput. Graph. Stat. 6: 122–131 Google Scholar Mau, B., M. A. Newton, B. Larget. 1999. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics. 55: 1–12 Google Scholar Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 21: 1087–1092 Google Scholar Navidi, W. C., G. A. Churchill, A. von Haeseler. 1991. Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol. Biol. Evol. 8: 128–143 Google Scholar ———.1993. Phylogenetic inference: linear invariants and maximum likelihood. Biometrics. 49:543–55 Google Scholar Nerurkar, V. R., H. T. Nguyen, W. M. Dashwood, P. R. Hoffmann, C. Yin, D. M. Morens, A. H. Kaplan, R. Detels, R. Yanagihara. 1996. HIV type 1 subtype E in commercial sex workers and injection drug users in southern Vietnam. AIDS Res. Hum. Retroviruses. 12: 841–843 Google Scholar Rannala, B., Z. Yang. 1996. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43: 304–311 Google Scholar Relman, D. A., T. M. Schmidt, A. Gajadhar, M. Sogin, J. Cross, K. Yoder, O. Sethabutr, P. Echeverria. 1996. Molecular phylogenetic analysis of Cyclospora, the human intestinal pathogen, suggests that it is closely related to Eimeria species. J. Infect. Dis. 173: 440–445 Google Scholar Richman, A. D., J. R. Kohn. 1999. Self-incompatibility alleles from Physalis: implications for historical inference from balanced genetic polymorphisms. Proc. Natl. Acad. Sci. USA. 96: 168–172 Google Scholar Rudolph, K. M., A. J. Parkinson, C. M. Black, L. W. Mayer. 1993. Evaluation of polymerase chain reaction for diagnosis of pneumococcal pneumonia. J. Clin. Microbiol. 31: 2661–2666 Google Scholar Rzhetsky, A., T. Sitnikova. 1996. When is it safe to use an oversimplified substitution model in tree-making?. Mol. Biol. Evol. 13: 1255–1265 Google Scholar Simon, C., L. Nigro, J. Sullivan, K. Holsinger, A. Martin, A. Grapputo, A. Franke, C. McIntosh. 1996. Large differences in substitutional pattern and evolutionary rate of 12S ribosomal RNA genes. Mol. Biol. Evol. 13: 923–932 Google Scholar Sinsheimer, J. S., J. A. Lake, R. J. Little. 1996. Bayesian hypothesis testing of four-taxon topologies using molecular sequence data. Biometrics. 52: 193–210 Google Scholar Swofford, D. L., G. J. Olsen, P. J. Waddell, D. M. Hillis. 1996. Phylogenetic inferencesPp. 407–514 in D. M. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. 2nd edition. Sinauer, Sunderland, Mass Google Scholar Tamura, K., M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10: 512–526 Google Scholar Thorne, J. L., H. Kishino, I. S. Painter. 1998. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15: 1647–1657 Google Scholar Tierney, L.. 1994. Markov chains for exploring posterior distributions (with discussion). Ann. Stat. 22: 1701–1762 Google Scholar Verdinelli, I., L. Wasserman. 1995. Computing Bayes factors using a generalization of the Savage-Dickey density ratio. J. Am. Stat. Assoc. 90: 614–618 Google Scholar Whelan, S., N. Goldman. 1999. Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics. Mol. Biol. Evol. 16: 1292–1299 Google Scholar Yang, Z.. 1995. A space-time process model for the evolution of DNA sequences. Genetics. 139: 993–1005 Google Scholar Yang, Z., B. Rannala. 1997. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Mol. Biol. Evol. 14: 717–724 Google Scholar
Molecular Biology and Evolution – Oxford University Press
Published: Jun 1, 2001
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get an introductory month for just $19.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.