Aligning sequences using MAFFT (via UNIX/conda)

In this worksheet you will learn how align multiple gene sequences to each other using MAFFT

Suggested prerequisites

It is recommended that you have followed the Concepts in Computer Programming and UNIX tutorial (basics) tutorials before starting.
A knowledge of MAFFT is useful. You can read the MAFFT paper here and the manual and other documents can be found here.
An understanding of multiple sequence alignment uses
Installing MAFFT through conda is easiest so its suggested you have followed the Setting up and using conda tutorial.

Dataset

This demonstration uses 16S_Staph_example_unaligned.fasta, a subset of the conlan_et_al.refpkg.Staphylococcus 16S rRNA gene dataset from Greengenes

Steps

Create a directory to store the analysis and then change directory into that directory
```
mkdir mafft_demo
cd mafft_demo
```
Download the dataset from 16S_Staph_example_unaligned.fasta
- You can save this directly to your terminal current working directory by using the wget command (wget can be installed via conda).

wget https://conmeehan.github.io/PathogenDataCourse/Datasets/16S_Staph_example_unaligned.fasta

Install MAFFT using conda
- It is recommended to always install packages in their own environments so here will we create an enironment and install MAFFT in one step.
```
mamba create -n mafft -c bioconda mafft -y
mamba activate mafft
```
Run MAFFT on the downloaded sequences

--auto is used to automatically select the best algorithm for aligning the sequences.
> indicates to direct the output alignment to 16S_Staph_example_aligned.fasta

mafft --auto 16S_Staph_example_unaligned.fasta >16S_Staph_example_aligned.fasta
``
4. Deactivate your mamba environment when finished
```c
mamba deactivate

Use of multiple sequence alignments (MSA)

MSA are the primary input to phylogenetic tree inference and other programs for comparative genomics. You can use this output to building a tree using RAxML-NG