I've decided to use MPIBLAST on the cluster at my local university to operate some similarity searches between sequences. This is practical because individual nodes on the cluster do not have enough RAM to hold all of the reference sequences. Since mpiblast
divides the reference sequences across nodes (instead of dividing the query sequences), this is a perfect solution to avoid hitting swap space.
But the thing is I am unable to get it to work. If you have any ideas on how to correct the error I'm getting, I would be very grateful. Here is how it is setup. First, I put this in my ~/.ncbirc
file:
[NCBI]
Data=/bubo/sw/apps/bioinfo/blast/2.2.24/data/
### Data - this is where blast grabs the scoring matrices,
### any "data" dir in the blast releases on kalkyl should do fine
[BLAST]
BLASTDB=/bubo/nobackup/uppnex/blast_databases
BLASTMAT=/bubo/sw/apps/bioinfo/blast/2.2.24/data/
### BLASTDB - you can just run blastall with the -d option, but if you want to use
### specific database, you can give a directory here.
### Note that those databases do not work with mpiblast, though.
### BLASTMAT - is in 99% of use cases the same as the "Data" above, where matrices are stored.
[mpiBLAST]
Shared=/bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb
Local=/bubo/home/h3/lucass/glob/test/mpiblast/local
### Shared - is some dir where you want to read/write database files, typically somewhere under your glob
### Local - is a any dir readable by the nodes.
Then I generated a test database like this (I'm simply taking the start of the nt database):
$ cd /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb
$ head -100000 /bubo/nobackup/uppnex/blast_databases/fasta/nt.fasta > sequences.fasta
$ mpiformatdb -i sequences.fasta --nfrags 22 -p F
This operation completes successfully. Finally I submitted this SLURM script with sbatch
to query the database on three nodes (each node has eight processors):
#!/bin/bash -l
#SBATCH -D /bubo/home/h3/lucass/glob/test/mpiblast/query
#SBATCH -J test_mpiblast
#SBATCH -o test_mpiblast.out
#SBATCH -t 15:00
#SBATCH -p node -n 24
# Modules #
module load mpiblast
# Make test query #
head -4 /bubo/nobackup/uppnex/blast_databases/fasta/nt.fasta > query.fasta
# Run BLAST #
mpirun -np 24 mpiblast -p blastn -d reference.fasta -i query.fasta -o query.xml -b 7
The error outputted in the standard out is the following:
mod: loaded OpenMPI 1.4.5, compiled with gcc4.6 (found in /opt/openmpi/1.4.5gcc4.6/)
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nhr /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nhr
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nhr
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nhr
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nin /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nin
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nin
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nin
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nsq /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nsq
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nsq
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nsq
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nnd /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nnd
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nnd
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nnd
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nni /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nni
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nni
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nni
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nsd /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nsd
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.012.nsd
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.012.nsd
ret_value = 32512
-------SNIP---------
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.013.nsi /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.013.nsi
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.013.nsi
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.013.nsi
ret_value = 32512
[15] 1.114980 (15) unable to copy fragment!
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.008.nsd /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.008.nsd
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.008.nsd
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.008.nsd
ret_value = 32512
cp command failed!
command: cp /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.008.nsi /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.008.nsi
source = /bubo/home/h3/lucass/glob/test/mpiblast/mpiblastdb/sequences.fasta.008.nsi
dest = /bubo/home/h3/lucass/glob/test/mpiblast/local/sequences.fasta.008.nsi
ret_value = 32512
[9] 1.115341 (9) unable to copy fragment!
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 14 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 14 with PID 12625 on
node q175 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
0 1 1.18088 Bailing out with signal 151.17914 Bailing out with signal 15
2 4 51.18093 Bailing out with signal 151.18096 Bailing out with signal 15
1.17917 Bailing out with signal 153
1.17923 Bailing out with signal 15
6 1.18096 Bailing out with signal 15
7 1.18104 Bailing out with signal 15
[q164.uppmax.uu.se:15654] 10 more processes have sent help message help-mpi-api.txt / mpi-abort
[q164.uppmax.uu.se:15654] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
It seems very strange that a simple copy command would fail. Trying to execute the same copy commands on the shell works fine, even when logged into one of the nodes on the cluster. Any ideas ?
well, it is not fair, i can upvote your answer though.
I agree that voting on one's own post is not fair. But if you look at stackexchange you can always accept your own answer.
Apparently, I can't accept my own answer ! : (