Entering edit mode
3.9 years ago
langziv
▴
70
Hi.
When I run the following command I get a "Floating point exception (core dumped)":
/linux-x64/mauveAligner --output=/mauve/output.mauve --output-alignment=/mauve/output.alignment
.
The bash script that calls the relevant python script:
#!/bin/bash
#PBS -q name
#PBS -N mauve
#PBS -e /err_and_out_files/mauve.ER
#PBS -o /err_and_out_files/mauve.OU
#PBS -l nodes=compute-0-311:ppn=10,mem=40000000kb
python "/scripts/mauve.py"
mauve.py:
import os
import shlex
import subprocess
directory = r'/output/mauve/'
mauve_path = "/mauve/linux-x64/mauveAligner"
for path in os.listdir(directory):
file_name_A = os.path.basename(path)
if 'A' in file_name_A:
A_file = directory + file_name_A
for file_name_B in os.listdir(directory):
if 'B' in file_name_B:
B_file = directory + file_name_B
command = f'{mauve_path} --output={A_file[:-3]}_{file_name_B[:-3]}.mauve ' \
f'--output-alignment={A_file[:-3]}_{file_name_B[:-3]}.alignment'
subprocess.run(shlex.split(command), capture_output=True).stdout.decode('utf-8')
I read the documentation and mauve-related questions and couldn't figure out what's wrong in the command.
Any ideas will be welcomed.
Assuming this is related to your previous query where you're relying on loading via
module
, that would suggest you're in a HPC/scheduler type environment - are you requesting sufficient memory for the task to complete?I'm not sure. I'm adding the bash script in which there are various parameters, including memory usage.
Try a smaller input dataset and request several GB of memory in the job submission script to ensure it completes. Your input dataset (without knowing anything about it) might be too large.
The data sets are of whole assemblies. I'm not sure if I can break them into smaller files.
When I run the bash script there's no error file created. The error message I mentioned in the question appears when I run the command on a linux node. Actually, when I run the bash script an empty output file is created. The bash script calls a python script, from which mauve is called.
How many assemblies? Just 2?
How big are the genomes?
You need to give us way more information, as a segfault is not a single issue with a single cause.
If you're running bash > bash script > python script you need to make sure the STDOUT and STDERR is captured correctly else the command may be failing 'silently' because the error isn't propagated correctly. I assume the bash script is the job submission file, why are you then calling a commandline program from python? Why not call it directly in the PBS script?
We need to see the content of
/scripts/mauve.py
Sorry about the delay. These are crazy days.
Yes. I'm runnnig 2 assemblies in every run. I just thought about it again, and I realize I can run multiple genomes to align with another genome, but since those are very big files, maybe I should stay with 2 genomes in every run.
The genomes' sizes are between 127 and 322 mb.
I wanted to call mauve from a python script so that I'll be able to iterate over multiple files. It can be done with bash, but I prefer doing it with python. I find python to be more convenient.
I'm adding the python script to the question.
Those are very big sequences to align still. Mauve is a java tool if I recall, so you will need to ensure you're also passing flags to java to enable it to access more memory which it doesn't appear like you're doing.
I would imagine mauve simply doesn't have access to enough memory to complete the alignment correctly (since a segfault is generally a memory issue)
The command I gave was inadequate. I didn't provide an sml file path. Once I did I managed to call mauve from a python script, but now I got a new issue. I'm getting the message: "* ERROR * Clust::SetLeafCount(0)" after mauve read the fasta files.
The command is
The printed output: