Accelerating BLAST for million sequence BLASTp all-by-all

2

Entering edit mode

9.9 years ago

Anand Rao ▴ 640

I need to run an all-by-all BLASTp on a large dataset of ~ 2 million protein sequences.

I see that there are 2 routes that folks have employed in the past. And some related posts are here at Correct Method To Blast All-Vs-All With Ncbiblast & How To Speed It Up? or elsewhere at http://seqanswers.com/forums/showthread.php?t=5752 etc

Route 1: Split input files and then run BLAST on these smaller chunks

Route 2: Use comparable tool such as open source mpiBLAST

Are these the only practical routes for large BLAST runs or are there other related / unrelated ways to go about it?

And finally is

Route 3: Both splitting input files AND using mpiBLAST a sound idea? If not, why not?

Thanks for your answers

BLAST parallel mpi cpu • 3.0k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Anand Rao ▴ 640

0

Entering edit mode

I moved this from forum since there is a clear question. In my opinion, use route 1. I did not see any major improvements with mpiBLAST and it is more difficult to configure and use. Splitting the input and doing blast in parallel should be easy to implement on any system.

ADD REPLY • link 9.9 years ago by SES 8.6k

Login before adding your answer.