Visualise phylogenetic trees in R using ggtree

  • In this worksheet you will learn how to use R to visualise basic phylogenetic trees and some associated data

Required prerequisite(s)

Suggested prerequisite(s)

Dataset

Steps

  1. Open RStudio
  2. Install the ggtree and ggplot2 packages (if you have not already)
    • ggtree is installed trhough Bioconductor, not the standard R libraries, so needs the installation of the bioconductor manager
if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("ggtree")

install.packages("ggplot2")
  1. Load all the required libraries
    library(ggtree)
    library(ggplot2)
    library(treeio)
    
  2. Read in the 16S_Staph_example.raxml.support.tree (which should be in your computer, in your current working directory of RStudio)
    tree <- read.newick(file = "16S_Staph_example.raxml.support.tree")
    
  3. We will starrt with the basic tree plot with no information
    ggtree(tree)
    
  4. We will add the tip labels and points to denote each of the internal and external nodes
    ggtree(tree)+ geom_tiplab()+geom_point() 
    
  5. Often the labels get cut off on the right hand side so we will limit the size of the x-axis of the tree so that we can see the full labels. We will also ‘nudge’ the labels a little away from the tree node points ```c ggtree(tree)+ geom_tiplab(nudge_x = 0.0005)+ geom_point() + xlim_tree(0.05)
8. Our tree has bootstrap support values so we will add these to the tree and also 'nudge' them to the right so they are clear
```c
ggtree(tree)+ geom_tiplab(nudge_x = 0.0005)+ geom_point() + xlim_tree(0.05)+ geom_nodelab(aes(label=label), nudge_x = 0.0009)
  1. Our tree now is in a basic visualisation for publication. We can save this to file like so:
    ggsave("16S_Staph_example_basicTree.pdf")
    
  2. We can add colouring of specific clades to the tree. To do this we first need to find out the numbers of the internal nodes, so we can group based on these
    ggtree(tree)+ geom_text2(aes(label=node), hjust=-.3)
    
  3. The command shows us the basic tree plot again and the numbers of the internal nodes. We will now create two groups: Node 10 and all its descendants (top 5 strains) and Node 14 and its descendants (middle 2 strains)
    tree <- groupClade(tree, c(10,14))
    
  4. We can now add colouring per clade to our tree as an aesthetic to the tree.
    • A legend is automatically added to the plot but since we dont need this as the groups dont have names, we remove this with the theme(legend.position = "none")
      ggtree(tree, aes(color=group, linetype="solid"))+ geom_tiplab(nudge_x = 0.0005)+ geom_point() + xlim_tree(0.05)+ geom_nodelab(aes(label=label), nudge_x = 0.0009)+ theme(legend.position = "none")
      
  5. If we have metadata we wish to show beside our tree, such as presence/absence data, we can add that as well.
  6. First, load in the mock data associated with the tree. Note that the first column has exactly the same names as the tips of our tree; we will set this to be the row names of our data frame
    data <- read.delim("Mock_Staph_data.tsv", row.names=1) 
    
  7. To add the data to our tree, we must first save the tree we want to output as an object.
    • When adding data baside a tree it can be difficult to have the tip labels also there, due to space, so we will remove the geom_tiplab(nudge_x = 0.0005) to avoid issues
    • We will also remove the legend from the entire plot instead of just the tree, so we remove the theme(legend.position = "none") here
      p <- ggtree(tree, aes(color=group, linetype="solid"))+ geom_point() + xlim_tree(0.05)+ geom_nodelab(aes(label=label), nudge_x = 0.0009)
      
  8. Add the data as a heatmap beside the tree
    gheatmap(p, data)+ theme(legend.position = "none")
    
  9. Save the new plot
    ggsave("16S_Staph_example_colouredTree_withData.pdf")
    

Additional guides