Entering edit mode
9.1 years ago
kaylamh6
▴
10
Hello,
I am trying to run Abyss on a cluster with openMPI, and am having some issues. Specifically, my assembly runs for some amount of time, then errors with the following message:
WARNING: A process refused to die!
Host: n022.fortytwo.ibest.uidaho.edu
PID: 12638
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
[n036.fortytwo.ibest.uidaho.edu:12161] 3 more processes have sent help message help-odls-default.txt / odls-default:could-not-kill
[n036.fortytwo.ibest.uidaho.edu:12161] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
--------------------------------------------------------------------------
mpirun noticed that process rank 17 with PID 14817 on node n019 exited on signal 9 (Killed).
--------------------------------------------------------------------------
[n036.fortytwo.ibest.uidaho.edu:12161] 1 more process has sent help message help-odls-default.txt / odls-default:could-not-kill
make: *** [Lb17-1.fa] Error 137
And here's my call to Abyss (I specify nodes and processors per node in my pbs script):
abyss-pe \
-C abyss_Lb17 \
k=21 \
name=Lb17 \
lib='pe1' \
pe1='Lb17_R1_qualtrim_renamed.fastq Lb17_R2_qualtrim_renamed.fastq'
It seems like this is an openMPI error, especially since the non-MPI command works just fine with my data. Please note that I was able to use this command on the Abyss test data set without any problems. Thanks in advance for your help!
As you say, it sounds like a problem with your MPI configuration or your job submission parameters. If you have never run MPI jobs on your cluster before, I would recommend doing some tests of your MPI setup with a simple "Hello, World!" program. See for example: http://mpitutorial.com/tutorials/mpi-hello-world/.
It is a bit of extra work, but it will make troubleshooting a lot easier.