Introduction to phylogenetics

  • This tutorial outlines the basics concepts in phylogenetics, including terminology and maximum likelihood approaches

Learning outcomes

  • Identify the various sections of a phylogenetic tree
  • Define various phylogenetic terminology
  • Describe the process of parsimony tree building
  • Translate a distance matrix into a phylogeny
  • Discuss the drawbacks of non- and semi-parametric phylogenetic methods
  • Describe the process of tree space searching
  • Recognise the maximum likelihood and bootstrap approaches
  • Use RAxML-NG to build a phylogenetic tree
  • Visualise phylogenetic trees with either iTOL or R packages
  • Calculate corrected distances using different models of evolution (optional)
  • Explain a rate matrix (optional)
  • Describe how site heterogeneity is modelled (optional)
  • Define ascertainment bias and its effect on phylogenetics (optional)

Prerequisites

Approximate time to finish tutorial

  • Lecture: 2 hours
    • Optional lecture on models of evolution: 30 minutes
  • Tutorials: 45 mins
  • Pre/post surveys: 10 minutes

Order of tutorial

Please do the pre-learning quiz, then watch the presentation.
During the presentation there are points to stop and do exercises, which are linked below. The answers to the questions in the exercises are linked within each one.
Once finished the tutorial, take the post-learing quiz.

Introduction to Phylogenetics Pre-tutorial Survey

Presentation

  • Download slides here
  • (Optional) Models of evolution
    • An optional set of slides outlining the use of evolutionary models for both correction of distances matrices and implementation of parametric phylogenetics

Tasks from slides with sample answers

Introduction to phylogenetics

Do sampled taxa sit at the end of internal or external branches of a phylogenetic tree?

Click here for answer External


What does monophyletic mean?

Click here for answer A group of taxa that contains an ancestor and all its descendants


How many offspring does a bifurcating node have?

Click here for answer 2


How do you write this tree in newick format (see Introduction to phylogenetic slide 13 for image)? I.e. using the ( X, Y ) format

Click here for answer (((D,C),B),A)


Is parsimony a non-parametric or semi-parametric method?

Click here for answer Non-parametric


What is the main drawback of parsimony methods?

Click here for answer Cannot account for convergent evolution


Is UPGMA a non-parametric or semi-parametric method?

Click here for answer Semi-parametric


Do you pick the samples with the smallest or largest distance at each step of the distance approach?

Click here for answer Smallest


What are the main ways to avoid getting stuck in a local maximum in tree searching?

Click here for answer * Multiple starting points * Multiple searches at once; can switch between searching chains * Allow large and small rearrangements * Allow some steps backwards to try improve score


Does maximum likelihood go sequence by sequence or column by column?

Click here for answer Column by column


In ML, at each step do you change the alignment or the tree?

Click here for answer Tree


In bootstrapping is sampling done with or without replacement?

Click here for answer With replacement


Models of evolution

If a uncorrected distance between two sequences is 0.3, what is the djc?

Click here for answer * djc=-3/4ln(1-4/3D) * djc=-3/4ln(1-(4/3) * (0.3)) * djc=0.383


How do we convert a rate (Q) matrix into a transition (P) matrix? Why?

Click here for answer Get the matrix exponential of the Q matrix. We can then know the probability of one nucleotide changing to another.


What kind of data requires ascertainment bias correction? Why?

Click here for answer SNP data because it does not contain invariant (constant) sites and so the branch lengths will likely be wrong.


Worksheets

Multiple sequence alignments (gene sequences)

Building phylogenetic trees

Visualising phylogenetic trees

Introduction to Phylogenetics Post-tutorial Survey

Other tutorials/tools