Question

standalone blastp: increasing word size extremely slows down the search

0

Entering edit mode

7.9 years ago

aleksanderczeszyk • 0

Hello,

I need to blastp a genome (15,000 seqs) against genome (12,000 seqs) using Biopython. I decided to use local blast and query genome 1 fasta file against genome 2 database ( made by makeblastdb command with second genome fast file ). I also managed to perform the blast search for default parameters of standalone blastp. However, when I try to change word size to BIGGER value ( default is 3 and i set it to 6, the blast performs extremely slow. I am kind of confused why such a thing happens because increasing word size is supposed to make things go faster. Here is how i pass arguments to NcbiblastpCommandline function:

NcbiblastpCommandline( word_size=6, query=queryInputPath, db=subjectInputPath, out=outputPath, outfmt=5 )()

things are much faster when the function does not have 'word_size=6' keyword argument. Without word size = 6 it takes around an 1,5 h to perform blast. My mac has 4gb of RAM and 1,6 GHz Intel Core i5 processor. What may be the cause?

genome blast • 3.9k views

ADD COMMENT • link updated 3.6 years ago by HenriettaHolze • 0 • written 7.9 years ago by aleksanderczeszyk • 0

2

Entering edit mode

Check that you're not running out of memory.

ADD REPLY • link 7.9 years ago by Jean-Karim Heriche 27k

2

Entering edit mode

With 4GB of RAM very likely.

ADD REPLY • link 7.9 years ago by GenoMax 152k

0

Entering edit mode

You may be able to save some overhead if you run BLAST directly from the command line, although not likely a meaningful amount. You may also try splitting the database up into multiple parts, just make sure you manually set the statistical options (e.g. dbsize). You'll have to do some post blast work to find the best hits, but this should get you around the memory issues.

ADD REPLY • link 7.9 years ago by pld 5.1k

0

Entering edit mode

Hi Aleksander, Long shot but did you ever figure out why increasing the word size slows down the search? I have the same problem with blastp version 2.11.0 and it does not look like I'm reaching any memory limit. Cheers, Henrietta

ADD REPLY • link 3.6 years ago by HenriettaHolze • 0

score 0 · Answer 1 · 2017-08-22

0

Entering edit mode

7.9 years ago

Istvan Albert 102k

I would recommend using blast replacements like DIAMOND or PAUDA

https://ab.inf.uni-tuebingen.de/software/diamond

https://ab.inf.uni-tuebingen.de/software/pauda

ADD COMMENT • link 7.9 years ago by Istvan Albert 102k

0

Entering edit mode

Just to be clear .. there is no point in trying to use these tools on the machine described in the original post.

ADD REPLY • link 7.9 years ago by GenoMax 152k