Typing bacteria using MLST (via UNIX)

In this worksheet you will learn how to use the MLST tool to type a bacterial genome via UNIX/Conda

Suggested prerequisites

It is recommended that you have followed the Concepts in Computer Programming and UNIX tutorial (basics) tutorials before starting.
A knowledge of the MLST tool is useful. You can access the manuals here
Installing MLST through conda is easiest so its suggested you have followed the Setting up and using conda tutorial.

Dataset

This demonstration uses the output of Assembling a genome from short reads (e.g. Illumina) using SPAdes worksheet but this will work on any assembly, such as that created in the Assembling a genome from long reads (e.g. ONT) using Flye worksheet. Thus, it is suggested you run at least one of these assembly methods first.
- You can download the example scaffolds output file of the SPAdes worksheet here: DRR187559_scaffolds.fasta

Steps

Create a directory for your analyses and step into it

mkdir mlst_demo
cd mlst_demo

Copy your assembled genome into this folder or download the sample data
- You can save this directly to your terminal current working directory by using the wget command (wget can be installed via conda).

wget https://conmeehan.github.io/PathogenDataCourse/Datasets/DRR187559_scaffolds.fasta

Install MLST using conda
- It is recommended to always install packages in their own environments so here will we create an enironment and install MLST in one step.

mamba create -n mlst -c bioconda mlst -y
mamba activate mlst

Run MLST on the scaffolds file
- MLST auto detects the correct species scheme to use for your data
- You can specify a scheme by adding the --scheme option. You can see a list of all schemes using mlst --longlist * --threads is the number of threads to dedicate to the process. My computer has 8 threads so I am dedicating 7 * The > redirect will put the resulting allele calls to the _mlst.tsv file

mlst --threads 7 DRR187559_scaffolds.fasta >DRR187559_mlst.tsv

View the resulting file
- You will see the sample name, the scheme used (s. aureus 764 in this case) and then the list of the MLST genes and the allele number associated with your input genome
```
cat DRR187559_mlst.tsv
```
- Guidance on tweaking this output and running multiple genomes at once can be found in the github page for MLST
Deactivate your mamba environment when finished
```
mamba deactivate
```