- In this worksheet you will learn how align multiple gene sequences to each other using MAFFT
Suggested prerequisites
- It is recommended that you have followed the Concepts in Computer Programming and UNIX tutorial (basics) tutorials before starting.
- A knowledge of MAFFT is useful. You can read the MAFFT paper here and the manual and other documents can be found here.
- An understanding of multiple sequence alignment uses
- Installing MAFFT through conda is easiest so its suggested you have followed the Setting up and using conda tutorial.
Dataset
- This demonstration uses 16S_Staph_example_unaligned.fasta, a subset of the conlan_et_al.refpkg.Staphylococcus 16S rRNA gene dataset from Greengenes
Steps
- Create a directory to store the analysis and then change directory into that directory
mkdir mafft_demo cd mafft_demo
- Download the dataset from 16S_Staph_example_unaligned.fasta
- You can save this directly to your terminal current working directory by using the wget command (wget can be installed via conda).
wget https://conmeehan.github.io/PathogenDataCourse/Datasets/16S_Staph_example_unaligned.fasta
- Install MAFFT using conda
- It is recommended to always install packages in their own environments so here will we create an enironment and install MAFFT in one step.
mamba create -n mafft -c bioconda mafft -y mamba activate mafft
- It is recommended to always install packages in their own environments so here will we create an enironment and install MAFFT in one step.
- Run MAFFT on the downloaded sequences
--auto
is used to automatically select the best algorithm for aligning the sequences.>
indicates to direct the output alignment to16S_Staph_example_aligned.fasta
mafft --auto 16S_Staph_example_unaligned.fasta >16S_Staph_example_aligned.fasta
``
4. Deactivate your mamba environment when finished
```c
mamba deactivate
Use of multiple sequence alignments (MSA)
MSA are the primary input to phylogenetic tree inference and other programs for comparative genomics. You can use this output to building a tree using RAxML-NG