Question

Dorado basecall: cryptic error with pod5 subset tool

0

Entering edit mode

12 months ago

antoinefelden ▴ 60

I’m running into a error that I haven’t managed to debug so far. The job runs for a few seconds, and then stops writing files. The errors are not always exactly the same, nor they are happening at the same point of the subsetting process. Here is a representative example, the “double free” error is recurring:

Subsetting:  0%|     | 0/497 [00:00<?, ?Files/s]
Subsetting:  2%|2     | 11/497 [00:00<00:04, 100.57Files/s]tcache_thread_shutdown(): unaligned tcache chunk detected
corrupted size vs. prev_size while consolidating

Subsetting:  4%|4     | 22/497 [00:00<00:06, 72.94Files/s] 
Subsetting:  7%|6     | 33/497 [00:00<00:06, 72.09Files/s]
Subsetting:  9%|8     | 44/497 [00:00<00:06, 70.97Files/s]double free or corruption (fasttop)

Subsetting: 10%|#     | 51/497 [00:19<02:54, 2.55Files/s]

Both errors seem to be related to memory/pointers at the pod5 subset command. I’ve tried the script with the newer version of Python, it returns a POD5 has encountered an error: 'libffi.so.7: cannot open shared object file: No such file or directory' error.

Here is the script I’m running:

#!/bin/bash
#SBATCH --partition=bigmem
#SBATCH --cpus-per-task=24
#SBATCH --mem=512G
#SBATCH --time=10-0:00:00
#SBATCH --ntasks=1
#SBATCH --job-name=1_pod5_split
#SBATCH -o /nfs/scratch/feldenan/%J.out
#SBATCH -e /nfs/scratch/feldenan/%J.err

set -o history -o histexpand

task_ID=${SLURM_JOB_NAME}_${SLURM_JOB_ID}
echo ${task_ID}

output_dir=/nfs/scratch/feldenan/$task_ID
mkdir -p $output_dir
cd $output_dir

module load GCCcore/9.3.0
module load Python/3.8.2
#module load GCCcore/11.3.0
#module load Python/3.10.4
module load GCC/10.3.0
module load OpenMPI/4.1.1
module load R/4.0.0
pip install pod5

POD5_DEBUG=1
LIB=Varroa_gDNA; POD5=/nfs/scratch/feldenan/Nanopore/01_data/Varroa_gDNA/B_clean/20231031_1641_MN45095_FAU99644_84b8b260/pod5_skip

echo $LIB

mkdir -p ./${LIB}_pod5_split

pod5 merge $POD5/*.pod5 -o ./$LIB.pod5
pod5 view ./$LIB.pod5 --threads 24 --include "read_id, channel" --output ./$LIB'_summary.tsv'
pod5 subset ./$LIB.pod5 --threads 24 --summary ./$LIB'_summary.tsv' --columns channel --output ./$LIB'_pod5_split'

Any idea on where the error could come from?

dorado Nanopore ONT sequencing pod5 • 835 views

ADD COMMENT • link 12 months ago by antoinefelden ▴ 60

0

Entering edit mode

Are first two steps working in your script i.e you see the summary.tsv file? Can you try running the subset command manually for a min to see if you get the same error. You could also try reducing the number of threads to see if that helps fix the error. If you don't have performant storage it may be causing some bandwidth issue.

This is not directly related to dorado since pod5 is a separate python package.

ADD REPLY • link 12 months ago by GenoMax 147k

0

Entering edit mode

Thanks! Yes, the script is producing expected output without any issue until the pod5 subset command. Reducing the number of threads did not change the errors. Thanks for the suggestion, I've posted an issue on pod5 GitHub: https://github.com/nanoporetech/pod5-file-format/issues/87

ADD REPLY • link 12 months ago by antoinefelden ▴ 60