- In this worksheet you will learn how to test if data follows a normal (Gaussian) distribution and two (or more) groups have a homogeny of variance
Important notes
- This test must be done before any T-test or ANOVA test (or equivalent non-parametric test).
Required prerequisite(s)
- You must have R installed on your computer and ideally also RStudio. See https://rstudio-education.github.io/hopr/starting.html for a guide.
Suggested prerequisite(s)
- It is recommended that you have followed the Concepts in Computer Programming and Introduction to R tutorials before starting.
- An understanding of tidy data. See https://www.youtube.com/watch?v=KW1laBLEiw0
- An understanding of parametric tests vs non-parametric tests: https://www.youtube.com/watch?v=biXY84hDX5M
- An understanding of the melt format in R: https://www.statology.org/melt-in-r/
Dataset
- This demonstration uses UnpairedDataset1.tsv but works the same for paired data
Steps
- Open RStudio
- Read in the UnpairedDataset1.tsv file. The read.delim will automatically assign row 1 as a header so no extra flags need to be passed to it
unpaired <- read.delim(“UnpairedDataset1.tsv")
- Our data has 3 groups and for ANOVA you would perform the tests on all three groups. For this example we will use only two groups. We will select just Group 1 and Group 2 for the test
unpairedGroups <- unpaired[,c("Group.1","Group.2")]
- The Shapiro-Wilk test will look for normality in each group. If either p-value is < 0.05 (or whatever p-value cut-off you are using) then the data is not normally distributed and you cannot use a parametric test. If both are >0.05 then they are both normally distributed and suitable for parametric tests (providing they also pass the homogeny of variance test below)
shapiro.test(unpairedGroups$Group.1) shapiro.test(unpairedGroups$Group.2)
- To perform a homogeny of variance test the data must in a ‘melted’ long format. Install the reshape 2 package (if needed) and then load the library
install.packages("reshape2") library("reshape2")
- Melt the data frame so it is in the right format for the T-test
meltedUnpairedGroups=melt(unpairedGroups)
- Perform the bartlett test. If the p-value is < 0.05 (or whatever p-value cut-off you are using) then there is uneven variance between the groups and you cannot use a parametric test. If it is >0.05 then the variances are equal and suitable for parametric tests (providing they also pass the normality test above)
bartlett.test(value ~ variable, data=meltedUnpairedGroups)