Aligning sequences using MAFFT (via UNIX/conda)

  • In this worksheet you will learn how align multiple gene sequences to each other using MAFFT

Suggested prerequisites

Dataset

Steps

  1. Create a directory to store the analysis and then change directory into that directory
    mkdir mafft_demo
    cd mafft_demo
    
  2. Download the dataset from 16S_Staph_example_unaligned.fasta
    • You can save this directly to your terminal current working directory by using the wget command (wget can be installed via conda).
wget https://conmeehan.github.io/PathogenDataCourse/Datasets/16S_Staph_example_unaligned.fasta
  1. Install MAFFT using conda
    • It is recommended to always install packages in their own environments so here will we create an enironment and install MAFFT in one step.
      mamba create -n mafft -c bioconda mafft -y
      mamba activate mafft
      
  2. Run MAFFT on the downloaded sequences
  • --auto is used to automatically select the best algorithm for aligning the sequences.
  • > indicates to direct the output alignment to 16S_Staph_example_aligned.fasta
mafft --auto 16S_Staph_example_unaligned.fasta >16S_Staph_example_aligned.fasta
``
4. Deactivate your mamba environment when finished
```c
mamba deactivate

Use of multiple sequence alignments (MSA)

MSA are the primary input to phylogenetic tree inference and other programs for comparative genomics. You can use this output to building a tree using RAxML-NG