- This tutorial outlines the basics concepts in phylogenetics, including terminology and maximum likelihood approaches
Learning outcomes
- Identify the various sections of a phylogenetic tree
- Define various phylogenetic terminology
- Describe the process of parsimony tree building
- Translate a distance matrix into a phylogeny
- Discuss the drawbacks of non- and semi-parametric phylogenetic methods
- Describe the process of tree space searching
- Recognise the maximum likelihood and bootstrap approaches
- Use RAxML-NG to build a phylogenetic tree
- Visualise phylogenetic trees with either iTOL or R packages
- Calculate corrected distances using different models of evolution (optional)
- Explain a rate matrix (optional)
- Describe how site heterogeneity is modelled (optional)
- Define ascertainment bias and its effect on phylogenetics (optional)
Prerequisites
- It is recommended that you have Notepad++ (Windows) or BBEdit (Mac) for viewing fasta files; most linux default editors can do this.
- It is recommended that you have followed the Concepts in Computer Programming, UNIX tutorial (basics) and Setting up and using conda tutorials if you are going to do the RAxML-NG via terminal worksheet.
- It is recommended that you have followed the Introduction to R tutorial if you are going to do the ggtree worksheet.
Approximate time to finish tutorial
- Lecture: 2 hours
- Optional lecture on models of evolution: 30 minutes
- Tutorials: 45 mins
- Pre/post surveys: 10 minutes
Order of tutorial
Please do the pre-learning quiz, then watch the presentation.
During the presentation there are points to stop and do exercises, which are linked below. The answers to the questions in the exercises are linked within each one.
Once finished the tutorial, take the post-learing quiz.
Introduction to Phylogenetics Pre-tutorial Survey
Presentation
- Download slides here
- (Optional) Models of evolution
- An optional set of slides outlining the use of evolutionary models for both correction of distances matrices and implementation of parametric phylogenetics
Tasks from slides with sample answers
Introduction to phylogenetics
Do sampled taxa sit at the end of internal or external branches of a phylogenetic tree?
Click here for answer
ExternalWhat does monophyletic mean?
Click here for answer
A group of taxa that contains an ancestor and all its descendantsHow many offspring does a bifurcating node have?
Click here for answer
2How do you write this tree in newick format (see Introduction to phylogenetic slide 13 for image)? I.e. using the ( X, Y ) format
Click here for answer
(((D,C),B),A)Is parsimony a non-parametric or semi-parametric method?
Click here for answer
Non-parametricWhat is the main drawback of parsimony methods?
Click here for answer
Cannot account for convergent evolutionIs UPGMA a non-parametric or semi-parametric method?
Click here for answer
Semi-parametricDo you pick the samples with the smallest or largest distance at each step of the distance approach?
Click here for answer
SmallestWhat are the main ways to avoid getting stuck in a local maximum in tree searching?
Click here for answer
* Multiple starting points * Multiple searches at once; can switch between searching chains * Allow large and small rearrangements * Allow some steps backwards to try improve scoreDoes maximum likelihood go sequence by sequence or column by column?
Click here for answer
Column by columnIn ML, at each step do you change the alignment or the tree?
Click here for answer
TreeIn bootstrapping is sampling done with or without replacement?
Click here for answer
With replacementModels of evolution
If a uncorrected distance between two sequences is 0.3, what is the djc?
Click here for answer
* djc=-3/4ln(1-4/3D) * djc=-3/4ln(1-(4/3) * (0.3)) * djc=0.383How do we convert a rate (Q) matrix into a transition (P) matrix? Why?
Click here for answer
Get the matrix exponential of the Q matrix. We can then know the probability of one nucleotide changing to another.What kind of data requires ascertainment bias correction? Why?
Click here for answer
SNP data because it does not contain invariant (constant) sites and so the branch lengths will likely be wrong.Worksheets
Multiple sequence alignments (gene sequences)
- Creating a multiple gene sequence alignment using MAFFT in UNIX
- Creating a multiple gene sequence alignment using MAFFT in Galaxy
Building phylogenetic trees
- Maximum likelihood phylogenetic tree building with RAxML-ng (via UNIX/conda)
- Maximum likelihood phylogenetic tree building with RAxML-ng (via webserver)
- Uses a graphical interface for building phylogenetic trees
Visualising phylogenetic trees
- Visualising phylogenies using ggtree in R
- Visualising phylogenies using the iTOL web interface
- A set of video tutorials by the makers of iTOL
Introduction to Phylogenetics Post-tutorial Survey
Other tutorials/tools
- Tree thinking assessments
- A series of questions to test your ability to interpret phylogenetic trees
- IQ-TREE2 the main alternative to RAxML-NG
- Comprehensive ggtree tutorials
- Extended ggtree tutorials beyond the worksheet above, outlining both phylogenetic and phylodynamic tree annotations
- Data integration, manipulation and visualisation of phylogenetic trees
- Free online book with extensive tutorials on interacting with phylogenetic trees through R using tidytree, treeio, ggtree and ggtreeExtra
- Phylogenetic tree building for Mycobacterium tuberculosis using Galaxy
- The RAxML-NG github page has an extended tutorial for running this tool and its additional features which you can access here