how to BLAST search for known genes in local set of genomes?
2
0
Entering edit mode
8.7 years ago
jerrybug109 ▴ 20

Hi all,

Our lab has sequenced a set of different Bacillus strains and assembled contigs for each individual genome. I wish to set up a search for the presence/absence of multiple known genes (we have FASTA files for those) in our set of genomes. I was hoping to do this via BLAST but looking at the website, it seems that you can only search for genes in genomes exclusively available on the NCBI database.

Is there any way to set up a search for genes in the genomes of the strains that I've sequenced and assembled? I was hoping to find an option to upload our own "search sets" but it doesn't seem to be available on http://blast.ncbi.nlm.nih.gov/Blast.cgi

Thanks!

ncbi blast genome • 3.8k views
ADD COMMENT
2
Entering edit mode
8.7 years ago
mastal511 ★ 2.1k

You can do standalone blast, and make your genomes into a database, but that requires using the command-line.

ADD COMMENT
0
Entering edit mode

Thanks, I appreciate your response. I might give this try then - I have rudimentary UNIX experience. By any chance, do you know of anything like Galaxy that offers service like BLAST but lets you upload your own database?

ADD REPLY
0
Entering edit mode

If you have UNIX skills to get started then by all means find a local computer resource (or even a desktop with respectable spces). This would be a good chance to get your feet wet and polish your UNIX skills. You can use Jim Kent's blat (in addition to/instead of blast) which can be very fast for identifying closely related sequences. Since you are working with bacteria you may not need very beefy hardware (8 G RAM may be min req).

You could also use blast 2 sequences against each other service from NCBI to search in a pairwise fashion.

ADD REPLY
2
Entering edit mode
8.7 years ago
natasha.sernova ★ 4.0k

You have a set of bacteria, so you don’t need to worry about introns.

Make a database and search inside the database with blastn, for example.

1) First you need to make a database of your nucleotide sequences.

To do this:

makeblastdb -in input_file (file-name of the contigs or whatever) -dbtype nucl (if nucleotide) -out dbname (the database name)

Use input file *.fa

2) Run the blast-program:

tblastx -query input (with the gene file) -db (database name, which was created in step 1) -out outname (file name with the results)

If you don’t like to work with proteins, use blastn for this search in the nucleotide database. You said you know the genes in the genomes?

If you know where their genes are you can translate them into proteins.

I would use tblastx for your task.

If you would like to search the database using a protein query,

use tblastn, but in practice tblastx usually finds more sequences...

ADD COMMENT

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6