Hello,
I am running Freebayes
variant calling pipeline. I have 1440 bam files from genotype by sequencing data. The genome size is around 540 Mb. I am using the following slurm
job script to run the job:
#!/bin/bash
#SBATCH --comment=19900416
#SBATCH --qos=std
#SBATCH --time=10-00:00:00
#SBATCH --nodes=1
#SBATCH --mem=240G
#SBATCH --cpus-per-task=32
#SBATCH --output=output_%j.txt
#SBATCH --error=error_output_%j.txt
#SBATCH --job-name=Freebayes.sh
#SBATCH --mail-type=ALL
# module load
module load freebayes/1.3.6 parallel vcflib python/2.7.15 bcftools
/home/freebayes-parallel <(cat /home/regions.txt ) 32 --report-genotype-likelihood-max --genotype-qualities --strict-vcf --ploidy 2 -f /home/genome.fasta --max-complex-gap 1 --min-alternate-fraction 0.2 --min-alternate-count 0 --min-coverage 0 --min-mapping-quality 1 --use-best-n-alleles 4 --min-base-quality 10 --report-monomorphic --bam-list /fastq/BAM/BAM_RG/MarkDup/gVCF/Bam.list > /home/Freebayes_raw.vcf
After running for 1 day and producing an 80 Gb VCF file, I got an Out of Memory
error.
Then I increased the above memory requirement to 320G and also dropped the --genotype-qualities
parameter, used --throw-away-complex-obs
. It again produced a similar-sized VCF file but took less time and then threw the same Out of Memory
error. The bam list contains all the 1440 bam file paths. I also made 100kb chunks of reference genomes using the fasta_generate_regions.py
script.
Can anyone please suggest how to overcome the OOM error? Should I increase the memory to 400G? or can I make subsets of the bam file list and then merge all the vcf at the end? I guess this approach is not recommended.
No idea if there is an option for you to install packages on your cluster i.e. using
conda
or runsingularity
images, but in general it is better to use as much up to date software as practical. This does not guarantee that the problem will disappear, but the developers usually are more likely to respond. The currentfreebayes
version in conda is 1.3.7 (no big diff) but it uses the latestpython3
,htslib
1.20 etc.Please do not use
bioinformatics
as a tag unless your post is about the field of bioinformatics itself.Alright. I will keep that in mind.