Normality and Homogeny of Variance T-test using R Worksheet

  • In this worksheet you will learn how to test if data follows a normal (Gaussian) distribution and two (or more) groups have a homogeny of variance

Important notes

  • This test must be done before any T-test or ANOVA test (or equivalent non-parametric test).

Required prerequisite(s)

Suggested prerequisite(s)

Dataset

Steps

  1. Open RStudio
  2. Read in the UnpairedDataset1.tsv file. The read.delim will automatically assign row 1 as a header so no extra flags need to be passed to it
    unpaired <- read.delim(UnpairedDataset1.tsv")
    
  3. Our data has 3 groups and for ANOVA you would perform the tests on all three groups. For this example we will use only two groups. We will select just Group 1 and Group 2 for the test
    unpairedGroups <- unpaired[,c("Group.1","Group.2")]
    
  4. The Shapiro-Wilk test will look for normality in each group. If either p-value is < 0.05 (or whatever p-value cut-off you are using) then the data is not normally distributed and you cannot use a parametric test. If both are >0.05 then they are both normally distributed and suitable for parametric tests (providing they also pass the homogeny of variance test below)
    shapiro.test(unpairedGroups$Group.1) 
    shapiro.test(unpairedGroups$Group.2)
    
  5. To perform a homogeny of variance test the data must in a ‘melted’ long format. Install the reshape 2 package (if needed) and then load the library
    install.packages("reshape2")
    library("reshape2")
    
  6. Melt the data frame so it is in the right format for the T-test
    meltedUnpairedGroups=melt(unpairedGroups)
    
  7. Perform the bartlett test. If the p-value is < 0.05 (or whatever p-value cut-off you are using) then there is uneven variance between the groups and you cannot use a parametric test. If it is >0.05 then the variances are equal and suitable for parametric tests (providing they also pass the normality test above)
    bartlett.test(value ~ variable, data=meltedUnpairedGroups)