Box plots in R

  • In this worksheet you will learn how to create a box plot using R

Required prerequisite(s)

Suggested prerequisite(s)

Dataset

  • This demonstration uses UnpairedDataset1.tsv as box plots are not always suitable for paired data.

Steps

  1. Open RStudio
  2. Read in the UnpairedDataset1.tsv file. The read.delim will automatically assign row 1 as a header so no extra flags need to be passed to it
    unpaired <- read.delim(UnpairedDataset1.tsv")
    
  3. The package ggplots2 is best for creating figures. Install the package (if needed) and load the library
    install.packages("ggplot2")
    library(ggplot2)
    
  4. To create a chart the data must in a ‘melted’ long format. Install the reshape 2 package (if needed) and then load the library
    install.packages("reshape2")
    library("reshape2")
    
  5. Melt the data frame so it is in the right format for ggplot2
    meltedUnpaired=melt(unpaired)
    
  6. The steps below are different ways to create boxplots, we more options added at each step. You don’t have to run each in order this way, you can skip to step 9 for the full chart; steps 7 and 8 are just for illustrative purposes
  7. We create the chart by telling ggplot2 we want the variable (groups) on the x-axis and the measurements (values) on the y-axis. It will calculate the summary statistics needed by istelf
    ggplot(data=meltedUnpaired, aes(x=variable, y=value)) +geom_boxplot()
    
  8. We can also add labels to each axis to better describe the data
    ggplot(data=meltedUnpaired, aes(x=variable, y=value)) +geom_boxplot() +labs(y= "Measurement", x = "Group") 
    
  9. We can also add colour each box by the group name and colour outliers in red
    ggplot(data=meltedUnpaired, aes(x=variable, y=value, fill=variable)) +geom_boxplot(outlier.colour="red")+labs(y= "Measurement", x = "Group") 
    
  10. Once happy with the chart, we can save it to file
    ggsave(file="unpairedBoxplot.png", plot=last_plot())
    

Further options for ggplot2 box plots