- In this worksheet you will learn how predict a range of pathogenic features such as antimicrobial resistance and virulence factors (abritAMR), and plasmid presence (PlasmidFinder)
- This is two separate analyses but use similar inputs
Suggested prerequisites
- It is recommended that you have followed the Concepts in Computer Programming and UNIX tutorial (basics) tutorials before starting.
- A knowledge of the two primary tools is useful. You can access their manuals here: abritAMR, PlasmidFinder
- Installing abritAMR and PlasmidFinder through conda is easiest so its suggested you have followed the Setting up and using conda tutorial.
Dataset
- This demonstration uses the output of Assembling a genome from short reads (e.g. Illumina) using SPAdes worksheet but this will work on any assembly, such as that created in the Assembling a genome from long reads (e.g. ONT) using Flye worksheet. Thus, it is suggested you run at least one of these assembly methods first.
- You can download the example scaffolds output file of the SPAdes worksheet here: DRR187559_scaffolds.fasta
AMR and virulence factor prediction (ABRitamr steps)
- Create a directory for your analyses and step into it
mkdir pathogenesis_demo cd pathogenesis_demo
- Copy your assembled genome into this folder or download the sample data
- You can save this directly to your terminal current working directory by using the wget command (wget can be installed via conda).
wget https://conmeehan.github.io/PathogenDataCourse/Datasets/DRR187559_scaffolds.fasta
- Install ABRitamr using conda
- It is recommended to always install packages in their own environments so here will we create an enironment and install ABRitamr in one step.
mamba create -n abritamr -c bioconda abritamr -y mamba activate abritamr
- It is recommended to always install packages in their own environments so here will we create an enironment and install ABRitamr in one step.
- Run ABRitamr on the scaffolds file
- The
run
command tells ABRitamr we wish to run the analysis (as opposed to the other pipeline which is to create a specific report) -c
is the genome assembly file-px
is the name of the folder to put all the output files inabritamr run -c DRR187559_scaffolds.fasta -px DRR187559_ABRitamr
- The
-
Inside the resulting output folder you will find multiple files. Their contents are explained here
- Deactivate your mamba environment when finished
mamba deactivate
Plasmid detection (PlasmidFinder steps)
- If you did not follow the abritAMR steps above, perform steps 1 and 2 now.
- If you did, navigate into the
pathogenesis_demo
folder
- If you did, navigate into the
- Install PlasmidFinder using conda
- It is recommended to always install packages in their own environments so here will we create an enironment and install PlasmidFinder in one step.
mamba create -n plasmidfinder -c bioconda plasmidfinder -y mamba activate plasmidfinder
- It is recommended to always install packages in their own environments so here will we create an enironment and install PlasmidFinder in one step.
- The database for PlasmidFinder must be downloaded. This is done with a built in programe that should be available in your PlasmidFinder conda environment
- We will redirect some of the produced information to a log file (>plasmidfinder-db.log) for future use
download-db.sh >plasmidfinder-db.log
- We will redirect some of the produced information to a log file (>plasmidfinder-db.log) for future use
- Look in the
plasmidfinder-db.log
file for the location of the downloaded databasecat plasmidfinder-db.log
- This will be listed after ‘Downloading PlasmidFinder 2.1 database to ‘ and before ‘…’
- e.g. on my computer it is /Users/cmeehan/opt/miniconda3/envs/plasmidfinder/share/plasmidfinder-2.1.6/database
- Make an output directory for PlasmidFinder
mkdir DRR187559_plasmidfinder
- Run PlasmidFinder on the assembled genome
-x
tells PlasmidFinder to give us all (extended) output files-i
is the input genome assembly file-o
is the output directory created i step 5-p
is the database file we downloaded in step 3, using the absolute path as noted down in step 4
plasmidfinder.py -x -i DRR187559_scaffolds.fasta -o DRR187559_plasmidfinder -p /Users/cmeehan/opt/miniconda3/envs/plasmidfinder/share/plasmidfinder-2.1.6/database
- Get the information on contigs/scaffolds that may contain plasmids
- The
results_tab.tsv
file lists each plasmid the had a hit to a contig along with the length of the hit -
Be wary when interpreting these results and look carefully at the length of the plasmids (can view on NCBI website using the accessions listed at the end of each line)
- Often short hits (<1kb) can occur erroneously to plasmids on contigs with similar genes; this does not mean there is a plasmid there (or that a plasmid is not there if you get such short hits)
- Bad assemblies (i.e. large or messy assembly graphs in Bandage) can produce such results
cat DRR187559_plasmidfinder/results_tab.tsv
- Deactivate your mamba environment when finished
mamba deactivate