Dear colleagues,
I keep getting a segmentation fault message whenever trying to run a tblastn command:
tblastn -db genome.fna -query protein.fas -out protein.out -num_threads 14 -outfmt 7.
I have done the following so far: I downloaded the latest linux version of BLAST+ from here https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ , compiled and installed the program. Afterwards I created a database from a somewhat large genome (over 15 gbp). The database consisted of two files as it was larger than 4GB.
makeblastdb -in genome.fna -parse_seqids -blastdb_version 5 -title "genome" -dbtype nucl -max_file_sz 4GB
I then tried to use tblastn with the following command
tblastn -db genome.fna -query protein.fas -out protein.out -num_threads 14 -outfmt 7
, which resulted in the aforementioned segmentation fault. (blastn runs normally) tblastn works fine online, when searching for the same protein and same genome used in the above command line.
The computer on which the commands are running on has a 128GB RAM and 14 core processor, so I doubt the hardware is to blame.
I wonder what the cause of this error could be.
Thank you in advance.
are you sure the database to blast to is called 'genome.fna' ? In any case you will only need to provide the prefix name of the blastDB .
blastn doesn't run with the prefix name. I have to provide the name of the file from which the database was created in order for it to run. That is not the issue. When given the name of the database it gives the following message
BLAST Database error: No alias or index file found for nucleotide database [genome] in search path [/data/username::]
this is related to the comment of GenoMax below.
it can very well be that your DB is called "genome.fna" (I would personally try to avoid it but ok)
Can it be you're over-asking the required resources? (btw, running it on 14 threads will not increase much, there is a known plateau for the # threads in blast, and only parts of the while procedure are mutlithreaded)
How much memory do you have available on the machine you run this on? (keep in mind that blastn, well mega-blastn by default) will use much less resources than translated blasts.
What does that mean? Can you show us a listing of
ls -l genome*
.here is the list:
Genome.fna.00.nhr
Genome.fna.00.nin
Genome.fna.00.nog
Genome.fna.00.nsq
Genome.fna.01.nhr
Genome.fna.01.nin
Genome.fna.01.nog
Genome.fna.01.nsq
Genome.fna.nal
Genome.fna.ndb
Genome.fna.nos
Genome.fna.not
Genome.fna.ntf
Genome.fna.nto
ok, so indeed your blast DB is called "Genome.fna" (with upper case G instead of lowercase as in your cdmline) .
It's upper case because its the beginning of the line...Sorry for that