Blastn Segmentation Fault
4
1
Entering edit mode
13.1 years ago
Srihari ▴ 30

Hi,

I am running a BLASTN of about 150 sequences against a genome that is 2.2 gigabases long. A few of my queries are actually full length BAC end sequences running to around 150,000 bases. I expect to find huge, contiguous hits for some BACs in the genome. Here's the command I use -

blastn 80BACs.fasta -db mygenome -out 80BACsBLAST -outfmt 10 -num_threads 8 -evalue 10e-3 -index_name mygenomeMBI

Around 10 minutes after it starts running, the program halts after producing a segmentation fault. I did a 'ulimit -s unlimited' to set the stack size to unlimited, but to no avail. I also went easy on the number of threads in subsequent trials, setting num_threads to 5 and subsequently, 2 - but that didn't help either.

I am using the binaries from rmblast-1.2-ncbi-blast-2.2.23+. I had earlier run a smaller query dataset against the same genome which worked fine, the BLAST completed in half a day. This issue, I am convinced is most definitely due to some very very long query sequences - I'd highly appreciate any help in this regard!

Thanks,

Srihari

blast • 12k views
ADD COMMENT
0
Entering edit mode

It works and helped me ,thanks

ADD REPLY
2
Entering edit mode
13.1 years ago
Lee Katz ★ 3.2k

For such a large query, you might want to try a different tool. Consider what you actually want to do. Mummer might be good if you want to find the region that your query matches with.
If you are searching for homology with genes, consider breaking your query into individual genes before using blastn.

If you are still sure that you want to query with 150k against a 2.2 Gb genome using blastn, then you can try certain tricks like increasing the word size which will reduce your sensitivity (put it up to 28 or even up to 50ish). I forget which way might be better for you in terms of filtering, but switching filtering on or off might help you too.

ADD COMMENT
2
Entering edit mode
13.0 years ago
Michael 55k

I agree with Lee that maybe blast is not the right tool for such a long query sequence. Maybe for this purpose MUMmer is the better choice. Your description sournds more like a global alignment problem.

Otherwise:

The blast+ programs are regularly updated and bugs get fixed, so the first thing you should do is to install the latest version, otherwise it is possible that you are running into a bug that is already fixed. The latest version is 2.2.15 atm and available here: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Install a 64bit binary, that is very important too.

After that, if the error persists: I have just experienced myself that there still exist certain (even short) input sequences that can simply crash blast. If possible, try to isolate the input sequence that causes the crash by feeding parts of the input file only, or feeding a single sequence at a time. If that doesn't help, try changing the -outfmt switch with a different output format.

ADD COMMENT
1
Entering edit mode
13.0 years ago

Hello unknown,

Since you've set the ulimit to unlimeted, a "core" file should have been generated (Segfault, core dumped). Therefore, you can further debug what has happened by running "gdb -c core_file".

gdb> bt

This should give you a backtrace on the last functions that were called (if there are empty or "??" function names, you should compile blastn yourself with -debug symbols).

Alternatively, you can run "strace" before the command: "strace blastn 80BACs.fasta ..." and have a look at the last 100 system calls to figure out if something has gone wrong with memory management.

Hope that helps !

Roman

ADD COMMENT
1
Entering edit mode
11.5 years ago
earonesty ▴ 250

Try without -num_threads ... that can crash blastn. Fortunately, it's pretty easy to split the input file into chunks, then run blast, then assemble the outputs. This is the only reliable way to "multithread" blast right now.

ADD COMMENT
0
Entering edit mode

Thanks. This helped me.

ADD REPLY

Login before adding your answer.

Traffic: 1592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6