I have a nextflow pipeline which keeps failing due to memory issues at the FILTER step.
In some cases the input bam is extremely large (30G).
Question 1: Would it help to split the FILTER process into two FILTER1 and FILTER2 process each running just one samtools sort? I tried this with a test...and it didn't seem to make a difference for some reason.
Question 2: Is there something I can do limit samtools sort (paramters -m or -@) so that it does not go over the memory limit but also doesn't run too slowly?
process FILTER {
...
time '6h'
cpus 8
penv 'smp'
memory '32 GB'
script:
"""
#!/usr/bin/env bash
samtools sort -n -m 5G -@ 12 "file1.bam" -o "$file1_sorted.bam"
samtools sort -n -m 5G -@ 12 "file2.bam" -o "$file2_sorted.bam"
"""
Thank you for breaking it down for me. I really learned a lot. I was indeed making the mistake of using more memory than declared.