- In this worksheet you will learn how create a Maximum Likelihood phylogenetic tree using RAxML-NG via the online web server
Suggested prerequisites
- A knowledge of RAxML-NG is useful. You can read the RAxML-NG paper here and the manual and other documents can be found here.
- A knowledge of the model of evolution parameters (if you wish to change the defaults)
Dataset
- This demonstration uses 16S_Staph_example_aligned.fasta as produced by the Aligning sequences using MAFFT (via UNIX/conda) worksheet.
- RAxML-NG requires sequences to be aligned so ensure you have performed multiple sequence alignment or, if doing a whole genome alignment, you have created a SNP alignment, e.g. using Snippy-core
Steps
- Download the dataset by clicking on this link
- Go to the RAxML-NG webserver page
- First upload the dataset. Click the ‘choose file’ button under the box where you paste the sequence alignment and select the
16S_Staph_example_aligned.fasta
file - We do not have a constraint tree so leave this box empty
- Under ‘Evolutionary model’ we will use the default one which is:
- Unpartitioned model
- We only have a single gene so do not need to partition the model per gene
- Datatype: DNA
- Our input is a 16S rRNA gene sequence alignment so we select DNA as our data type
- Substitution matrix: GTR
- This is fine for the majority of analyses, unless you specifically wish to choose a sub-model
- Stationary base frequencies: ML estimate
- We will allow RAxML-NG to estimate our base frequencies as it does its analyses
- Proportion of invariant sites: unchecked
- We will leave this option unselected as we do not wish to include invariant sites as a separate parameter
- Among-site rate heterogeneity: GAMMA
- We wish to include among-site rate heterogeneity estimates in our model and will use the Gamma distribution for this
- Number of rate categories: 4
- We will use 4 separate categories in our Gamma distribution partitioning
- GAMMA category rates: mean
- We will use the mean estimates to separate our Gamma distribution
- Ascertainment bias correction: None
- We are giving a whole gene alignment so do not need to correct for missing data. *If using a SNP alignment we would use one of these options, most likely Stamatakis and input the invariant site counts for the four bases
- Under ‘Analysis’ we will mostly use the default settings:
- ML tree search: checked boxes for optimise Topology, Branch lengths and Model
- We want the program to optimise all parameters
- Starting trees Parsimony: 10 and Random: 10
- We want to run the ML analysis 20 times and select the best tree from these independent runs so we will start 10 times from a parsimony tree and 10 times from a random tree
- Leave the ‘Paste your tree in newick format’ box blank
- This box is only if you have a specific tree you wish to start the analyses from
- Bootstrapping: select
- Tick this box to add bootstrap support analysis to the ML tree
- Number of replicates: Automatic and Bootstrapping cutoff 0.3
- We wish the program to estimate when to stop the bootstrapping replicates based on if it consistently finds the same trees over and over (i.e. a distance of 0.3 between the tree topologies)
- You can add your email address to the box to have the results sent to you (suggested to do this, especially for large dataset analysis)
- You can choose whether you wish to share your data anonymously with the creators of RAxML-NG
-
Once finished, click ‘Submit’
- A link will come up where your results will be stored. Click this link and await it finishing
- Example data should only take a couple of minutes
- A link will come up where your results will be stored. Click this link and await it finishing
- Once completed, you will see a ‘Download result’ button. Click this and a zip file will be downloaded. Open this zip file and you will see the following files:
result.raxml.bestModel
: The model of evolution used by RAxML-NG, which is estimated during the runresult.raxml.bestTree
: The maximum likelihood tree without the bootstrap valuesresult.raxml.bootstraps
: The individual trees produced at every bootstrap replicateresult.raxml.log
: The log file outlining the individual steps undertaken by RAxML-NGresult.raxml.mlTrees
: RAxML-NG ran the ML algorithm 20 times and selects the best run (to try and avoid local maxima). This file stores the trees from all 20 runs.result.raxml.rba
: The input alignment stored in a binary formatresult.raxml.startTree
: RAxML-NG ran the ML algorithm 20 times and selects the best run (to try and avoid local maxima). Each of these has a starting tree built with either parsimony or random on which the ML algorithm begins. This file stores the starting trees from all 20 runsresult.raxml.support
: The maximum likelihood tree with the bootstrap values
- This is the tree you usually want to use for further analyses
*
sequenceAlignment.fasta
: The alignment sequence file you input *slurm_raxml.sbatch
: The commands run on the RAxML-NG webserver back end processes queue