Samtools sort: too many open files error
2
0
Entering edit mode
3.7 years ago
rekren ▴ 40

Hello everyone! I am using a cluster to align my WGS data to indexed reference genome via BWA. I receive this error on samtools sort part of the task;

[main] CMD: bwa mem -t 40 /save/refs/bwa_refs/Homo_sapiens.GRCh38.dna.toplevel.fa.gz /work/fastqs/557-male_$
[main] Real time: 125431.829 sec; CPU: 4892515.629 sec
[bam_sort_core] merging from 1260 files and 35 in-memory blocks...
[E::hts_open_format] Failed to open file /work/rekren/aligned/aligned_to_human_M_sortedbam.bam.tmp.1016.bam
samtools sort: fail to open "/work/aligned/aligned_to_human_M_sortedbam.bam.tmp.1016.bam": Too many open files

I have read about -m and @ parameters of samtools sort from the manual but playing around with different combinations for those I received out-of memory error from the cluster. Maybe I am overlooking something. Can you suggest a solution for me please?

P.S.:

To submit this task to cluster I use SLURM and my script is below;

#!/bin/bash
#SBATCH -J bwa_task #job name
#SBATCH -e error.out #error file name
#SBATCH --mem=200G #memory reservation
#SBATCH --cpus-per-task=40 #ncpu on the same node
#SBATCH --mail-type=BEGIN,END,FAIL (name.surname@email.com)
#Purge any previous modules
module purge
#Load the application
module load bioinfo/bwa-0.7.17
module load bioinfo/samtools-1.9
# My command lines I want to run on the cluster
bwa mem -t 40 /save/refs/bwa_refs/Homo_sapiens.GRCh38.dna.toplevel.fa.gz /work/fastqs/557-male_ACGCACCT-CCTTCACC-BHNJYMDSXY_L004_R1.fastq.gz /work/fastqs/557-male_ACGCACCT-CCTTCACC-BHNJYMDSXY_L004_R2.fastq.gz | samtools sort -@ 35 -o /work/aligned/aligned_to_human_M_sortedbam.bam
samtools sort • 3.8k views
ADD COMMENT
1
Entering edit mode

What is the output of ulimit -n? I always set this to ulimit -u 50000. Try also to use much fewer sorting threads and increase memory per thread, e.g. -@ 8 -m 4G leading to fewer intermediate files (adjust command to fit your available memory as bwa will also need a lot).

ADD REPLY
0
Entering edit mode
rekren@genologin2 ~/work $ ulimit -n
1024
rekren@genologin2 ~/work $ ulimit -u
4096

Then I set it to;

rekren@genologin2 ~/work $ ulimit  -u 50000

Now I tried with less number of sorting threads. I thought being greedy and assigning more threads for the task was a better thing to do, guess not. I will update you about the outcome later on.

UPDATE: Not sure because of not having elevated authorization to play with ulimit but, it didn't solve my issue, unfortunately. I did something else to solve it. Still, thank you for your help @ATPoint !

ADD REPLY
1
Entering edit mode
3.7 years ago
rekren ▴ 40

First keeping the output as bam then sorting it via controlled usage of Picard worked in my case. It might help other people who are facing the same kind of issue.

#!/bin/bash
#SBATCH -J bwa_task #job name
#SBATCH -e error.out #error file name
#SBATCH --mem=200G #memory reservation
#SBATCH --cpus-per-task=50 #ncpu on the same node
#SBATCH --mail-type=BEGIN,END,FAIL (name.surname@email.com)
#Purge any previous modules
module purge
#Load the application
module load bioinfo/picard-2.20.7
bwa mem -t 50 /save/refs/bwa_refs/Homo_sapiens.GRCh38.dna.toplevel.fa.gz /work/fastqs/557-male_ACGCACCT-CCTTCACC-BHNJYMDSXY_L004_R1.fastq.gz /workn/fastqs/557-male_ACGCACCT-CCTTCACC-BHNJYMDSXY_L004_R2.fastq.gz | samtools view -F 0x4 -bh -o /work/aligned/aligned_to_human_M.bam
java -Xmx170g -jar ${PICARD} SortSam \
    INPUT=/work/aligned/aligned_to_human_M.bam \
    OUTPUT=/work/aligned/aligned_to_human_M_sorted.bam \
    SORT_ORDER=coordinate \
    MAX_RECORDS_IN_RAM=1000000 \
ADD COMMENT
2
Entering edit mode
3.6 years ago
jkbonfield ★ 1.3k

Look at the man page for samtools-sort. Increase the memory size (-m) and it'll use fewer files. It's easy to see the effect as it reports how many files it is merging with, so you can make a guess at the correct value looking at ulimit to see the maximum you are permitted. (Default mem is 768Mb per thread IIRC)

Just remember -m is per thread. I dislike that, but it's too late to change.

ADD COMMENT

Login before adding your answer.

Traffic: 1822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6