Entering edit mode
4.0 years ago
geosmin
▴
20
I have a set of organisms that I want to perform BLAST searches on with the same query. Of course I want to automate this procedure instead of running BLAST individually for each organism. I've tried these three approaches:
- Biopython's
NCBIWWW.qblast()
method (very slow) - blast+ program with local database (problems setting up the database)
- download genome data via FTP and using
makeblastdb
(didn't understand how yet)
Before I dive deeper into each topic, I wanted to ask what other people would do. It seems like such a trivial task, so I guess there must be a somewhat simple procedure.
What is the size of the query? You may be able to do a
-remote
blast+ search at NCBI (you will need to download compiledblast+
) using command-line blast and limiting the searches to specific organism/taxonomy ID's.Thank you for your answer! My query is about 500 amino acids long. It seemed to work at first, but then it took forever again when I specified an Entrez search query. I'm not sure if it's due to that or just a server connection problem (I've had that many times).
Try using the taxID limits. That may work better than the entrez query. Depending on how many organisms you need you could download their genomic protein sequence (
.faa
files) from respective genome directories, if you are doing a blastp. This is not that difficult to do, if you are already familiar with unix/blast+ command line.Has anyone found code to Blast on a defined set of of organisms using Biopython's NCBIWWW.qblast() method?