Hi Everyone,
I am new to this field. I am looking for good and stable parallel implementation of Blast. I am trying this one: http://salsahpc.indiana.edu/tutorial/hadoopblast.html
However, I found that this program simply assigns one task for each query input file, which makes me wonder what's the point of using Hadoop.
There is also http://archimedes.cheme.cmu.edu/?q=gpublast, but it only works with blastp, and there is a scathing rebuttal here https://larsjuhljensen.wordpress.com/2011/01/28/commentary-the-gpu-computing-fallacy/.
I am wondering if there is any good and stable parallel implementation of Blast. My employer is going to set up a computer cluster, and we might want to run Blast there. But by its nature, computer cluster is only beneficial if the program is parallel.
Thank you very much!
Pedantic comment: you need to distinguish between "parallel" and "distributed". Basic NCBI BLAST is already parallel on shared-memory systems and has been for as long as I can remember. (The GPU implementation will also only work on shared memory systems.) If you want parallelization across a cluster, splitting up the input by query sequence is almost certainly going to be the most efficient method (and the most common one).
I like to add, while BLAST itself has a parallelization option, I find its implementation rarely using the full resources available. Instead, it often uses only one CPU instead of the number of CPUs specified. If I remember correctly, this is due to the way, the searching is performed.
I strongly recommend splitting the input data as this allows the best parallelization (either on shared memory or distributed). By the way, this is exactly what Hadoop Map-Reduce Framework was developed for...
And there is this: http://www.abokia.com/Products.htm
If you have the budget to purchase a commercial product then it may be an option.