Phylogenetics (EEB 5349)
This is a graduate-level course in phylogenetics, emphasizing primarily maximum likelihood and Bayesian approaches to estimating phylogenies, which are genealogies at or above the species level. A primary goal is to provide an accessible introduction to the theory so that by the end of the course students should be able to understand much of the primary literature on modern phylogenetic methods and know how to intelligently apply these methods to their own problems. The laboratory provides hands-on experience with several important phylogenetic software packages (PAUP*, IQ-TREE, RevBayes, BayesTraits, and others) and introduces students to the use of remote high performance computing resources to perform phylogenetic analyses.
Semester: Spring 2022
Lecture: Tuesday/Thursday 11:00-12:15 (Paul O. Lewis)
Lab: Thursday 1:25-3:20 (Zach Muscavitch)
Room: Torrey Life Science (TLS) 181, Storrs Campus (but first two weeks are online)
Text: Lewis, P. O. 2022. Getting Rooted in Bayesian Phylogenetics (unfinished, but some chapters are ready)
Schedule
Date | Lecture topic | Lab/homework |
---|---|---|
Tuesday Jan. 18 | Introduction The jargon of phylogenetics (edges, vertices, leaves, degree, split, polytomy, taxon, clade); types of genealogies; rooted vs. unrooted trees; newick descriptions; monophyletic, paraphyletic, and polyphyletic groups [slides] |
Homework 1: Trees From Splits |
Thursday Jan. 20 | Optimality criteria, search strategies, consensus trees Exhaustive enumeration, branch-and-bound search, algorithmic methods (star decomposition, stepwise addition, NJ), heuristic search strategies (NNI, SPR, TBR), evolutionary algorithms; consensus trees [slides] |
Lab 1: Using the Xanadu cluster, Introduction to PAUP*, and NEXUS file format |
Tuesday Jan. 25 | The parsimony criterion Strict, semi-strict, and majority-rule consensus trees; maximum agreement subtrees; Camin-Sokal, Wagner, Fitch, Dollo, and transversion parsimony; step matrices and generalized parsimony [slides] [study questions] |
Homework 2: Parsimony |
Thursday Jan 27 | Distance methods Distance methods: least squares criterion, minimum evolution criterion, neighbor-joining [slides] [study questions] |
Lab 2: Searching |
Tuesday Feb. 1 | Substitution models Instantaneous rates, expected number of substitutions, equilibrium frequencies, JC69 model. Textbook: Ch. 2 (pp. 19-30); Ch. 3 (pp. 35-38). [slides] [study questions] |
Homework 3: Least squares distances (working through the Python Primer first will make this homework much easier) |
Thursday Feb. 3 | Maximum likelihood criterion JC distance formula; common substitution models: K2P, F81, F84, HKY85, and GTR; likelihood: the probability of data given a model, likelihood of a “tree” with just one vertex and no edges, why likelihoods are always on the log scale, likelihood ratio tests. [Transition Probability Applet] Textbook: Ch. 5 (pp. 57-75) [slides] [study questions] |
Lab 3: Likelihood [slide] |
Tuesday Feb. 8 | Maximum likelihood (cont.) Likelihood of a tree with 2 vertices connected by one edge, transition probabilities, maximum likelihood estimates (MLEs) of model parameters, likelihood of a tree. [slides] Textbook: Ch. 4: pp. 47-53 |
Homework 4: Site likelihoods |
Thursday Feb. 10 | Bootstrapping, rate heterogeneity Non-parametric bootstrapping [slides] Rate heterogeneity Invariable sites model, Discrete gamma model, site-specific rates (partitioned) models, mixture models. Textbook: Ch. 6: pp. 81-92. [slides] |
Lab 4: IQ-TREE tutorial |
Tuesday Feb. 15 | Simulation How to simulate nucleotide sequence data, and why it’s done [slides] Textbook: Ch. 6: pp. 93-96. |
Homework 5: Rate heterogeneity (python program to modify) |
Thursday Feb. 17 | Long branch attraction, topology tests Statistical consistency, long branch attraction, testing the molecular clock, nonparametric bootstrap topology tests (KH/SH/AU), and parametric bootstrapping tests (SOWH). [LBA slides] [Topology test slides] |
Lab 5: Simulating sequences |
Tuesday Feb. 22 | Codon, secondary structure, and amino acid models Nonsynonymous vs. synonymous rates, codon models, RNA stem/loop structure, compensatory substitutions, stem models, empirical amino acid rate matrices (PAM, JTT, WAG, LE) [slides] [Diagonalization applet] |
Homework 6: Simulation |
Thursday Feb. 24 | Bayes’ Rule Joint, conditional, and marginal probabilities, and how they interact to create Bayes’ Rule; Probability vs. probability density. [slides] Textbook: Ch. 7 (Bayes’ Rule; pp. 101-116) |
Lab 6: Using R to explore probability distributions and plot trees |
Tuesday Mar. 1 | Bayesian statistics, MCMC Metropolis-Hastings algorithm; mixing, burn-in, trace plots, heated chains, topology proposals, Updating parameters during MCMC. [slides] [MCMC robot applet] |
Homework 7: MCMC |
Thursday Mar. 3 | Prior distributions used in phylogenetics Discrete Uniform (topology), Gamma or Lognormal (kappa, omega), Beta (pinvar), Dirichlet (base frequencies, GTR exchangeabilities); Tree length prior. [Dirichlet applet] [Density rain applet] [slides] Textbook: Ch. 8 (MCMC; pp. 121-146) |
Lab 7: To concatenate or not to concatenate, that is the question |
Tuesday Mar. 8 | Prior distributions (cont.) and CIs Running on empty, prior fences, induced priors, hierarchical models, empirical Bayes; Frequentist confidence intervals vs. Bayesian credible intervals [slides] [CI applet] |
Homework 8: Larget-Simon Local Move |
Thursday Mar. 10 | Dirichlet Process Prior Bayesian non-parametric clustering: examples include BUCKy (genes clustered by topology); PhyloBayes (amino-acid sites clustered by frequency spectra) [Stick-breaking applet] [DPP applet] [slides] |
Lab 8: MrBayes |
Tuesday Mar. 15 | SPRING BREAK | |
Thursday Mar. 17 | SPRING BREAK | |
Tuesday Mar. 22 | Bayes factors and Bayesian model selection Bayes factors, steppingstone estimation of marginal likelihood, BIC vs. AIC [slides] |
Homework 9: Dirichlet Process Priors |
Thursday Mar. 24 | Discrete morphological models Introduction to discrete morphological models; Mk model; conditioning on variability. [slides] |
Lab 9: Introduction to RevBayes |
Tuesday Mar. 29 | Polytomies; Pagel’s test Polytomies and the star tree paradox; reversiblep-jump MCMC; Pagel’s (1994) test for correlated evolution. [polytomy slides] [Pagel slides] |
Homework 10: Mk model and conditioning on variability |
Thursday Mar. 31 | Stochastic character mapping An alternative to Pagel’s (1994) test for assessing whether correlation among characters goes beyond what is expected from inheritance alone. [slides] [additional slides] |
Lab 10: RevBayes (discrete morphology analyses) |
Tuesday Apr. 5 | Evolutionary Correlation: Continuous Traits Independent Contrasts [slides] [Brownian Motion applet] and Phylogenetic Generalized Least Squares (PGLS). [slides] |
Homework 11: Maddison and Fitzjohn 2015 |
Thursday Apr. 7 | PGLS (cont.) Estimating ancestral states in PGLS. Ornstein-Uhlenbeck model vs. Brownian motion. [slides] [OU applet] |
Lab 11: BayesTraits |
Tuesday Apr. 12 | Phylogenetic signal in continuous traits Measuring the amount of phylogenetic information in continuous traits (Pagel’s lambda, Blomberg’s K). [slides] [Pagel transformation applet] [Introduction to the coalescent Introduction to coalescent theory [slides] |
Homework 12: Brownian motion model |
Thursday Apr. 14 | Multispecies coalescent and species tree estimation The multispecies coalescent used to estimate species trees given possibly conflicting gene trees due to deep coalescence, incomplete lineage sorting, and the anomaly zone. [slides] |
Lab 12: Continuous trait analyses in R |
Tuesday Apr. 19 | Fast species tree methods The SVDQuartets and ASTRAL species tree methods. [slides] Divergence time estimation Strict vs. relaxed clocks, correlated vs. uncorrelated relaxed clocks, calibrating the clock using fossils. [slides] |
Homework 13: Heterotachy |
Thursday Apr. 21 | Diversification rate evolution State-dependent diversification models (BiSSE and its descendants) [relaxed clocks part 2 slides] [diversification slides] |
Lab 13: Divergence time estimation |
Tuesday Apr. 26 | Diversification (cont.), Heterotachy, and Covarion models BAMM: estimating the number of shifts in diversification regime and where these occur on the tree; what is heterotachy and how can it be accommodated; the covarion hypothesis and model [slides] |
no homework assignment |
Thursday Apr. 28 | Species delimitation and information Bayesian species delimitation (BPP), Bayesian information content estimation [slides] |
Lab 14: BAMM |
Index to major topics
Literature cited
Grading
Books (and book chapters) on phylogenetics
This is a list of books that you should know about, but none are required texts for this course. Listed in reverse chronological order.
Harmon, L. 2019. Phylogenetic comparative methods. (Version 1.4, released 15 March 2019). Published online by the author.
Yang, Z. 2014. Molecular evolution: a statistical approach. Oxford University Press.
Baum, D. A., and S. D. Smith. 2013. Tree thinking: an introduction to phylogenetic biology. Roberts and Company Publishers, Greenwood Village, Colorado. (This book is probably the most useful companion volume for this course, introducing the methods in a very accessible way but also providing lots of practice interpreting phylogenies correctly.)
Garamszegi, L. Z. 2014. Modern phylogenetic comparative methods and their application in evolutionary biology: concepts and practice. Springer-Verlag, Berlin. (Well-written chapters by current leaders in phylogenetic comparative methods.)
Hall, B. G. 2011. Phylogenetic trees made easy: a how-to manual (4th edition). Sinauer Associates, Sunderland. (A guide to running some of the most important phylogenetic software packages.)
Lemey, P., Salemi, M., and Vandamme, A.-M. 2009. The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing (2nd edition). Cambridge University Press, Cambridge, UK (Chapters on theory are paired with practical chapters on software related to the theory.)
Felsenstein, J. 2004. Inferring phylogenies. Sinauer Associates, Sunderland. (Comprehensive overview of both history and methods of phylogenetics.)
Page, R., and Holmes, E. 1998. Molecular evolution: a phylogenetic approach. Blackwell Science (Very nice and accessible pre-Bayesian-era introduction to the field.)
Hillis, D., Moritz, C., and Mable, B. 1996. Molecular systematics (2nd ed.). Sinauer Associates, Sunderland. Chapter 12: Applications of molecular systematics. (Still a very valuable compendium of pre-Bayesian-era phylogenetic methods.)
Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Chapter 11: Phylogenetic inference. Pages 407-514 in Molecular Systematics (D. M. Hillis, C. Moritz, and B. K. Mable, eds.). Sinauer Associates, Sunderland, Massachusetts. (SOWH topology test)