- In this worksheet you will learn how to use the MLST tool to type a bacterial genome via UNIX/Conda
Suggested prerequisites
- It is recommended that you have followed the Concepts in Computer Programming and UNIX tutorial (basics) tutorials before starting.
- A knowledge of the MLST tool is useful. You can access the manuals here
- Installing MLST through conda is easiest so its suggested you have followed the Setting up and using conda tutorial.
Dataset
- This demonstration uses the output of Assembling a genome from short reads (e.g. Illumina) using SPAdes worksheet but this will work on any assembly, such as that created in the Assembling a genome from long reads (e.g. ONT) using Flye worksheet. Thus, it is suggested you run at least one of these assembly methods first.
- You can download the example scaffolds output file of the SPAdes worksheet here: DRR187559_scaffolds.fasta
Steps
- Create a directory for your analyses and step into it
mkdir mlst_demo
cd mlst_demo
- Copy your assembled genome into this folder or download the sample data
- You can save this directly to your terminal current working directory by using the wget command (wget can be installed via conda).
wget https://conmeehan.github.io/PathogenDataCourse/Datasets/DRR187559_scaffolds.fasta
- Install MLST using conda
- It is recommended to always install packages in their own environments so here will we create an enironment and install MLST in one step.
mamba create -n mlst -c bioconda mlst -y
mamba activate mlst
- Run MLST on the scaffolds file
- MLST auto detects the correct species scheme to use for your data
- You can specify a scheme by adding the
--scheme
option. You can see a list of all schemes usingmlst --longlist
*--threads
is the number of threads to dedicate to the process. My computer has 8 threads so I am dedicating 7 * The>
redirect will put the resulting allele calls to the_mlst.tsv
file
mlst --threads 7 DRR187559_scaffolds.fasta >DRR187559_mlst.tsv
- View the resulting file
- You will see the sample name, the scheme used (s. aureus 764 in this case) and then the list of the MLST genes and the allele number associated with your input genome
cat DRR187559_mlst.tsv
- Guidance on tweaking this output and running multiple genomes at once can be found in the github page for MLST
- You will see the sample name, the scheme used (s. aureus 764 in this case) and then the list of the MLST genes and the allele number associated with your input genome
- Deactivate your mamba environment when finished
mamba deactivate