Hi!
I am studying bioinformatics and actually am working on a validation project. We are using the command line version of blast. We created a database of the 500 000 sequences we received (using makeblastdb) and wanted to run a blast using 500 000 sequences as query. Well. I've started that yesterday at 2 pm, and... It's still running! I was wondering, maybe we did something wrong? Is it normal for it to run so long, could I do something (appart from setting the evalue, and changing the number of threads) to speed things up? I've checked on my processor something is being done. The problem is I have no way of checking how much time is left.
Thanks in advance
Thank you for your quick answer... I am using a t430 lenovo( i5-3320M, 2.60GHz, 8GB RAM). Actually, the NCBI process is using about 19 000 kB of memory. These are protein sequences, I am running a blastp, all have about 200-300 aminoacids... We've runned a few tests, based on them, it seems like it should take about 48 days to run...
As I said, this is a validation project. The goal is to verify if, in the history of evolution, there were episodes of reversion (from MAGDA to ADGAM, for example). We are bound to use makeblastdb, so we used the initial sequences to create a database, and wanted to blast the reversed one to this database. As we have two weeks to do it, I guess we should look for another way.
Thanks again for your help, we'll definitly check out what you've sent and try a different approach.
This sounds like a small desktop computer, how many cores do you have, and are you using them all? I am not sure if Blast is the best tool to detect small reversed pattern in sequences, without mismatches btw? If the minimum length of such pattern is 5 like for MAGDA and you don't look for mismatches, then you could increase the word size to 5 (I think default is 3) which would make blast run faster.
I would look for a tool that is specifically for this task. Otherwise, you are not strictly bound to blast, because you can always dump the sequences from the blastdb into a fasta. Maybe Diamond even accepts blast dbs.
I have 2 cores and have no idea if I am using them all. I guess not, sorry, I don't know how to check that.
I am at the moment researching such tools.The problem is, based on our instructions, we have to create a database and use the program blast. About Diamond, I've checked, but it is a replacement tool for a blastx, and our problem is more of a protein-protein one, no?
DIAMOND can do protein-protein alignment too, and it's much faster than BLAST. You should give it a try.
I am actually trying to install diamond, thank you ;)