- In this worksheet you will learn how to create a box plot using R
Required prerequisite(s)
- You must have R installed on your computer and ideally also RStudio. See https://rstudio-education.github.io/hopr/starting.html for a guide.
Suggested prerequisite(s)
- It is recommended that you have followed the Concepts in Computer Programming and Introduction to R tutorials before starting.
- An understanding of tidy data. See https://www.youtube.com/watch?v=KW1laBLEiw0
- An understanding of the melt format in R: https://www.statology.org/melt-in-r/
Dataset
- This demonstration uses UnpairedDataset1.tsv as box plots are not always suitable for paired data.
Steps
- Open RStudio
- Read in the UnpairedDataset1.tsv file. The read.delim will automatically assign row 1 as a header so no extra flags need to be passed to it
unpaired <- read.delim(“UnpairedDataset1.tsv")
- The package ggplots2 is best for creating figures. Install the package (if needed) and load the library
install.packages("ggplot2") library(ggplot2)
- To create a chart the data must in a ‘melted’ long format. Install the reshape 2 package (if needed) and then load the library
install.packages("reshape2") library("reshape2")
- Melt the data frame so it is in the right format for ggplot2
meltedUnpaired=melt(unpaired)
- The steps below are different ways to create boxplots, we more options added at each step. You don’t have to run each in order this way, you can skip to step 9 for the full chart; steps 7 and 8 are just for illustrative purposes
- We create the chart by telling ggplot2 we want the variable (groups) on the x-axis and the measurements (values) on the y-axis. It will calculate the summary statistics needed by istelf
ggplot(data=meltedUnpaired, aes(x=variable, y=value)) +geom_boxplot()
- We can also add labels to each axis to better describe the data
ggplot(data=meltedUnpaired, aes(x=variable, y=value)) +geom_boxplot() +labs(y= "Measurement", x = "Group")
- We can also add colour each box by the group name and colour outliers in red
ggplot(data=meltedUnpaired, aes(x=variable, y=value, fill=variable)) +geom_boxplot(outlier.colour="red")+labs(y= "Measurement", x = "Group")
- Once happy with the chart, we can save it to file
ggsave(file="unpairedBoxplot.png", plot=last_plot())