Question

Blastn Segmentation Fault

1

Entering edit mode

13.7 years ago

Srihari ▴ 30

Hi,

I am running a BLASTN of about 150 sequences against a genome that is 2.2 gigabases long. A few of my queries are actually full length BAC end sequences running to around 150,000 bases. I expect to find huge, contiguous hits for some BACs in the genome. Here's the command I use -

blastn 80BACs.fasta -db mygenome -out 80BACsBLAST -outfmt 10 -num_threads 8 -evalue 10e-3 -index_name mygenomeMBI

Around 10 minutes after it starts running, the program halts after producing a segmentation fault. I did a 'ulimit -s unlimited' to set the stack size to unlimited, but to no avail. I also went easy on the number of threads in subsequent trials, setting num_threads to 5 and subsequently, 2 - but that didn't help either.

I am using the binaries from rmblast-1.2-ncbi-blast-2.2.23+. I had earlier run a smaller query dataset against the same genome which worked fine, the BLAST completed in half a day. This issue, I am convinced is most definitely due to some very very long query sequences - I'd highly appreciate any help in this regard!

Thanks,

Srihari

blast • 12k views

ADD COMMENT • link updated 23 months ago by Amirhossein Hajianpour ▴ 40 • written 13.7 years ago by Srihari ▴ 30

0

Entering edit mode

It works and helped me ,thanks

ADD REPLY • link 2.7 years ago by liuyulu90 • 0

score 2 · Answer 1 · 2011-12-08

For such a large query, you might want to try a different tool. Consider what you actually want to do. Mummer might be good if you want to find the region that your query matches with.
If you are searching for homology with genes, consider breaking your query into individual genes before using blastn.

If you are still sure that you want to query with 150k against a 2.2 Gb genome using blastn, then you can try certain tricks like increasing the word size which will reduce your sensitivity (put it up to 28 or even up to 50ish). I forget which way might be better for you in terms of filtering, but switching filtering on or off might help you too.

Ram · Answer 2 · 2011-12-23

I agree with Lee that maybe blast is not the right tool for such a long query sequence. Maybe for this purpose MUMmer is the better choice. Your description sournds more like a global alignment problem.

Otherwise:

The blast+ programs are regularly updated and bugs get fixed, so the first thing you should do is to install the latest version, otherwise it is possible that you are running into a bug that is already fixed. The latest version is 2.2.15 atm and available here: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Install a 64bit binary, that is very important too.

After that, if the error persists: I have just experienced myself that there still exist certain (even short) input sequences that can simply crash blast. If possible, try to isolate the input sequence that causes the crash by feeding parts of the input file only, or feeding a single sequence at a time. If that doesn't help, try changing the -outfmt switch with a different output format.

score 1 · Answer 3 · 2011-12-25

Hello unknown,

Since you've set the ulimit to unlimeted, a "core" file should have been generated (Segfault, core dumped). Therefore, you can further debug what has happened by running "gdb -c core_file".

gdb> bt

This should give you a backtrace on the last functions that were called (if there are empty or "??" function names, you should compile blastn yourself with -debug symbols).

Alternatively, you can run "strace" before the command: "strace blastn 80BACs.fasta ..." and have a look at the last 100 system calls to figure out if something has gone wrong with memory management.

Hope that helps !

Roman

score 1 · Answer 4 · 2013-06-26

1

Entering edit mode

12.2 years ago

earonesty ▴ 250

Try without -num_threads ... that can crash blastn. Fortunately, it's pretty easy to split the input file into chunks, then run blast, then assemble the outputs. This is the only reliable way to "multithread" blast right now.