This tutorial outlines the various online biological databases and their uses as well as an introduction to homology terminology and searching using BLAST.
Learning outcomes
- Describe the primary steps in de novo assembly
- Implement basic assembly quality control and metrics
- List the primary outputs of genome annotation
- Describe the primary steps in reference mapping
- Compare the pros and cons of assembly and mapping approaches
- Undertake genome assembly using Flye or Spades
- Undertake genome annotation using Bakta
- Undertake reference-based mapping using Snippy
Prerequisites
- It is recommended that you have Notepad++ (Windows) or BBEdit (Mac) for viewing fasta files; most linux default editors can do this.
- It is recommended that you have followed the Concepts in Computer Programming, UNIX tutorial (basics) and Setting up and using conda tutorials if you are going to do the UNIX-based worksheets.
Approximate time to finish tutorial
- Lecture: 1.5 hours
- Tutorials: 1.5 hours
- Pre/post surveys: 10 minutes
Order of tutorial
Please do the pre-learning quiz, then watch the presentation.
During the presentation there are points to stop and do exercises, which are linked below. The answers to the questions in the exercises are linked within each one.
Once finished the tutorial, take the post-learing quiz.
Genome Assembly Pre-tutorial Survey
Presentation
Tasks from slides with sample answers
What is the sequencing depth of the two positions highlighted in blue? (see slides for image)
Click here for answer
G: 7A: 8
What is the resulting sequence after the de bruijn-based joining of these two reads? (k=3) TTAACCA CCAAAAT
Click here for answer
TTAACCAAATWhich of these is a good maximum number of contigs in an assembly? 100 500 1000
Click here for answer
100tRNAs are a type of coding or non-coding gene?
Click here for answer
Non-codingWorksheets
UNIX shell approaches
- SPAdes for bacterial and viral genome assembly (short reads)
- Flye for bacterial and viral genome assembly (Long reads)
- BUSCO and Bandage for genome completeness and quality checking
- Bakta for genome annotation
- Snippy for reference mapping (SNP calling)
Galaxy approaches
- Bacterial genome assembly using Galaxy
- Tutorial set includes de novo assembly (short and long read) and quality control tutorials
- rnaviralSPAdes and coronaSPAdes can be used in the same way as SPAdes in these tutorials to assemble viral genomes (just search of these in the Galaxy side bar)
- Calling SNPs compared to a reference genome using Snippy in Galaxy
- SARS-CoV2 workflows including assembly and human read removal using Galaxy
- Bacterial genome annotation using Prokka on Galaxy
- Prokka is no longer being actively supported, it is suggested to use Bakta instead
- Bacterial genome annotation using Bakta on Galaxy
Genome Assembly Post-tutorial Survey
Other tools and videos
- Trycycler tutorial for in depth hybrid assembly
- VirAmp has a Galaxy-like interface for assembling viral genomes
- Phables for bacteriophage assembly
- Set of videos on genome sequencing and assembly
- De Bruijn graphs and Eulerian walks video
- Comparison of de novo and mapping approaches