Aligning sequences using MAFFT (via UNIX/conda)

  • In this worksheet you will learn how align multiple gene sequences to each other using MAFFT

Suggested prerequisites



  1. Create a directory to store the analysis and then change directory into that directory
    mkdir mafft_demo
    cd mafft_demo
  2. Download the dataset from 16S_Staph_example_unaligned.fasta
    • You can save this directly to your terminal current working directory by using the wget command (wget can be installed via conda).
  1. Install MAFFT using conda
    • It is recommended to always install packages in their own environments so here will we create an enironment and install MAFFT in one step.
      mamba create -n mafft -c bioconda mafft -y
      mamba activate mafft
  2. Run MAFFT on the downloaded sequences
  • --auto is used to automatically select the best algorithm for aligning the sequences.
  • > indicates to direct the output alignment to 16S_Staph_example_aligned.fasta
mafft --auto 16S_Staph_example_unaligned.fasta >16S_Staph_example_aligned.fasta
4. Deactivate your mamba environment when finished
mamba deactivate

Use of multiple sequence alignments (MSA)

MSA are the primary input to phylogenetic tree inference and other programs for comparative genomics. You can use this output to building a tree using RAxML-NG