List of Contributors . . XXV 1 The minimum evolution distance-based approach to phylogenetic inference . . 1 1.1 Introduction . . 1 1.2 Tree metrics . . 3 1.3 Edge and tree length estimation . . 11 1.4 The agglomerative approach . . 17 1.5 Iterative topology searching and tree building . . 20 1.6 Statistical consistency . . 25 1.7 Discussion . . 28 Acknowledgements . . 29 2 Likelihood calculation in molecular phylogenetics . . 33 2.1 Introduction . . 33 2.2 Markov models of sequence evolution . . 35 2.3 Likelihood calculation: the basic algorithm . . 40 2.4 Likelihood calculation: improved models . . 42 2.5 Optimizing parameters . . 46 2.6 Consistency of the likelihood approach . . 49 2.7 Likelihood ratio tests . . 55 2.8 Concluding remarks . . 58 Acknowledgements . . 58 3 Bayesian inference in molecular phylogenetics . . 63 3.1 The likelihood function and maximum likelihood estimates . . 63 3.2 The Bayesian paradigm . . 66 3.3 Prior . . 67 3.4 Markov chain Monte Carlo . . 69 3.5 Simple moves and their proposal ratios . . 74 3.6 Monitoring Markov chains and processing output . . 78 3.7 Applications to molecular phylogenetics . . 81 3.8 Conclusions and perspectives . . 85 Acknowledgements . . 86 4 Statistical approach to tests involving phytogenies . . 91 4.1 The statistical approach to phylogenetic inference . . 91 4.2 Hypotheses testing . . 92 4.3 Different types of tests involving phylogenies . . 106 4.4 Non-parametric multivariate hypothesis testing . . 111 4.5 Conclusions: there are many open problems . . 115 Acknowledgements . . 115 5 Mixture models in phylogenetic inference . . 121 5.1 Introduction: models of gene-sequence evolution . . 121 5.2 Mixture models . . 122 5.3 Defining mixture models . . 123 5.4 Digression: Bayesian phylogenetic inference . . 125 5.5 A mixture model combining rate and pattern-heterogeneity . . 127 5.6 Application of the mixture model to inferring the phylogeny of the mammals . . 129 5.7 Results . . 131 5.8 Discussion . . 138 Acknowledgements . . 139 6 Hadamard conjugation: an analytic tool for phylogenetics . . 143 6.1 Introduction . .143 6.2 Hadamard conjugation for two sequences . .144 6.3 Some symmetric models of nucleotide substitution . . 147 6.4 Hadamard conjugation—Neyman model . . 151 6.5 Applications: using the Neyman model . . 162 6.6 Kimura's 3-substitution types model . . 171 6.7 Other applications and perspectives . . 174 7 Phylogenetic networks . . 178 7.1 Introduction . . 178 7.2 Median networks . . 180 7.3 Visual complexity of median networks . . 184 7.4 Consensus networks . . 186 7.5 Treelikeness . . 188 7.6 Deriving phylogenetic networks from distances . . 191 7.7 Neighbour-net . . 195 7.8 Discussion . . 199 Acknowledgements . . 200 8 Reconstructing the duplication history of tandemly repeated sequences . . 205 8.1 Introduction . . 205 8.2 Repeated sequences and duplication model . . 206 8.3 Mathematical model and properties . . 212 8.4 Inferring duplication trees from sequence data . . 221 8.5 Simulation comparison and prospects . . 229 Acknowledgements . . 231 9 Conserved segment statistics and rearrangement 9.1 Introduction . . 236 9.2 Genetic (recombinational) distance . . 237 9.3 Gene counts . . 238 9.4 The inference problem . . 239 9.5 What can we infer from conserved segments? . . 240 9.6 Rearrangement algorithms . . 243 9.7 Loss of signal . . 244 9.8 From gene order to genomic sequence . . 245 9.9 Between the blocks . . 252 9.10 Conclusions . . 256 Acknowledgements . . 257 10 The inversion distance problem . . 262 10.1 Introduction and biological background . . 262 10.2 Definitions and examples . . 264 10.3 Anatomy of a signed permutation . . 266 10.4 The Hannenhalli-Pevzner duality theorem . . 277 10.5 Algorithms . . 282 10.6 Conclusion . . 287 Glossary . . 287 11 Genome rearrangements with gene families . . 291 11.1 Introduction . . 291 11.2 The formal representation of the genome . . 293 11.3 Genome rearrangement . . 294 11.4 Multigene families . . 298 11.5 Algorithms and models . . 299 11.6 Genome duplication . . 303 11.7 Duplication of chromosomal segments . . 309 11.8 Conclusion . . 313 12 Reconstructing phylogenies from gene-content and gene-order data . . 321 12.1 Introduction: phylogenies and phylogenetic data . . 321 12.2 Computing with gene-order data . . 330 12.3 Reconstruction from gene-order data . . 337 12.4 Experimentation in phylogeny . . 342 12.5 Conclusion and open problems . . 345 Acknowledgements . . 346 13 Distance-based genome rearrangement phylogeny . . 353 13.1 Introduction . . 353 13.2 Whole genomes and events that change gene orders . . 354 13.3 Distance-based phytogeny reconstruction . . 356 13.4 Empirically Derived Estimator . . 359 13.5 IEBP: "Inverting the expected breakpoint distance" . . 363 13.6 Simulation studies . . 372 13.7 Summary . . 378 Acknowledgements . . 380 14 How much can evolved characters tell us about the tree that generated them? . . 384 14.1 Introduction . . 384 14.2 Preliminaries . . 386 14.3 Information-theoretic bounds: ancestral states and deep divergences . . 388 14.4 Phase transitions in ancestral state and tree reconstruction . . 396 14.5 Processes on an unbounded state space: the random cluster model . . 401 14.6 Large but finite state spaces . . 405 14.7 Concluding comments . . 408 Acknowledgements . . 409 Index . . 413