- In this worksheet you will learn how to find homologous (similar) sequences to a set of query sequences in a custom made database using the BLAST tool on Galaxy
Required prerequisite(s)
- You must create an account on https://usegalaxy.eu/ and log in to that account
- You have uploaded the dataset files listed below to Galaxy by following one of the tutorials on the Loading data into Galaxy page
Suggested prerequisite(s)
- An understanding of how to use Galaxy. Some good guidance: https://www.youtube.com/watch?v=uVNdyrVDYYU
Dataset
- This demonstration uses the EscherichiaDB fasta file and the EcoliToxins fasta file
Steps
- In your web browser, navigate to https://usegalaxy.eu/
- Log in to your account using the ‘Login or Register’ button in the top navigation bar
- Your datafiles EcoliToxins.fasta and EscherichiaDB.fasta should already be in the history on the righthand side. If not, follow one of the tutorials on the Loading data into Galaxy page
- To use BLAST on Galaxy there are two steps: create a local database of sequences you wish to search against and then undertake the BLAST itself
- First we will create a database of Escherichia genomes
- In the lefthand side menu, in the search box under ‘Tools’ type makeblastdb
- Click on ‘NCBI BLAST+ makeblastdb’
- The database is made of genome sequences so select ‘nucleotide’ under ‘Molecule type of input’
- Select the EscherichiaDB.fasta under ‘Input FASTA files’
- You can create a database of multiple sequence files individually if you wish by holding the shift button and selecting multiple files at once
- Type ‘Escherichia reference genomes’ as the title
- Set ‘Parse the sequence identifiers’ to yes because these genomes came from NCBI
- If using your own sequences made through SPAdes or similar, the sequence identifiers will not be in NCBI format so leave this option as no
- Leave everything else as it is
- Click ‘Run tool’
- This may take some time to run but you should then have an entry in your history that is called ‘BLAST database from data x’ where x is the number of your EscherichiaDB.fasta
- Once your database is created, you can search your query sequence(s) against the database
- The file EcoliToxins.fasta contains gene sequences of known toxins in E. coli which we will compare to the genome sequences in the database
- Since we are comparing nucleotide sequences to a nucleotide database we will use BLASTn
- Other sequence types require other types of BLAST. Check the “Biological databases and BLAST.pptx” slides for guidance
- In the lefthand side menu, in the search box under ‘Tools’ type blastn
- Click on ‘NCBI BLAST+ blastn Search nucleotide database with nucleotide query sequence(s)’
- Under ‘Nucleotide query sequence(s)’ select EcoliToxins.fasta
- In the Subject database/sequences box select ‘BLAST database from your history’ in the dropdown menu and ensure the database created above is in the BLAST database box
- We will use megablast for the algorithm as are comparing very similar sequences
- Our database is small and we expect very similar hits so we want to make the evalue cut-off more stringent. Under ‘Set expectation value cutoff’ type 1e-30
- The other options are fine as default so click the ‘run tool’ button
- You will see the output files appear in your history on the right.
- Once these turn green and the clock symbol has disappeared the analysis is finished
- To download any of these files click on the file in your history (righthand menu) and then click the small save icon that appears at the bottom left of that box.
- Most files can then be viewed in a text viewer such as Notepad++ or BBEdit
- The query sequence name is in column 1, the evalue in column 11 and the full name of the sequence it hit in the database is the final entry on each line