bwa mem memory allocation, threads and nodes
1
0
Entering edit mode
4.8 years ago
User000 ▴ 710

Hello,

I am running bwa mem of 200 paired-end fastq.gz files against 2 reference genomes, each 12 Gbp. It is extremely slow and continues to give me core dumped error. I have googled various solutions, and people suggest that this might be memory allocation problem? So I was wondering how to specify the amount of memory the program should use? And how much memory should be enough for this kind of analysis? Moreover, how many threads and nodes would you suggest me to use?

Tools/bwa-0.7.12/bwa mem -t 8 {wildcards.genome} {input.fastq1} {input.fastq2} | /Tools/samtools-1.3/samtools sort -o {output.bam}

I invoke the code like this:

snakemake -s snakefile -j 24
bwa • 7.3k views
ADD COMMENT
1
Entering edit mode

For the human genome bwa (in my experience) takes somewhat 5Gb for the index alone (do not remember exact number). Say 5Gb, times 24 jobs is already 192Gb for index loading alone. 8 threads per job require additional buffer to keep the reads and intermediate results in RAM so you probably run out of memory unless being on a big node that has a lot of memory. For our standard 192Gb nodes I typically run 4 bwa jobs in parallel with 16 threads each. It can be RAM-hungry at times especially when hitting reads that are difficult to align.

By the way, the sorting requires additional 768Mb ram per job, so that is another like 20Gb for your script. Keep that in mind. How much memory is available?

ADD REPLY
0
Entering edit mode

so if I set threads to 16 and -j 4, wouldn't it run only one job at a time? According to this link: https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html Sorry, I feel a bit confused. I have around 200GB

ADD REPLY
0
Entering edit mode

Sorry, I was interpreting -j as parallel jobs as in GNU parallel. Not a snakemake user myself. Will move my answer to comment. Are you the only user of the node or are other jobs running on there?

ADD REPLY
1
Entering edit mode

-j does mean the amount of jobs to run in parallel right? https://snakemake.readthedocs.io/en/v3.13.2/executable.html I thought your answer was good and lowering -j can solve the problem. You can also just try to map one or two samples manually at the same time to check if this error comes up again. Or change -j to 1 or 2 and check if you still get the error.

ADD REPLY
0
Entering edit mode

Hi gb thanks for the suggestion, I am mapping 1 paired-end sample from yesterday afternoon, I guess it is not normal it is still running? ;(

ADD REPLY
0
Entering edit mode

hmm thought bwa-mem is pretty fast, someone else need to confirm that. You can check the processes with the command htop or top.

ADD REPLY
0
Entering edit mode

yeah I do... the good news is that it is running..the bad news is that it is still running....

ADD REPLY
0
Entering edit mode

at the moment I am the only user

ADD REPLY
0
Entering edit mode

Just noted you are using an ancient samtools version. Upgrading is recommended. Not sure if this syntax was already valid with version 1.3 (so auto-detecting file format).

ADD REPLY
0
Entering edit mode

Ah so you mean I probably cannot pass directly to samtools sort with this old version? I would love to use the newer version, but the previous analyses from my college was run with this version, so to make both analyses compatible I have to use this old version...

ADD REPLY
0
Entering edit mode

I don't think that is a good strategy with accessory programs like samtools. You should upgrade to the latest to benefit from bug fixes/ease of use.

ADD REPLY
0
Entering edit mode

Never mind, just checked the documentation, should be fine.

ADD REPLY
0
Entering edit mode

If a previous analysis was done, why aren't you using the bam files from that analysis?

ADD REPLY
0
Entering edit mode

because I have different samples and different genomes.. anyway with threads 16 and -j 16 it takes 5 hrs for one samples to align.. don't know if this sounds like an acceptable timing..

ADD REPLY
1
Entering edit mode

Another solution that can eventually help with snakemake is to use snakemake wrappers that can do exactly what you want to do. It contains a wrapper handling bwa-mem, also the wrapper allows to output in .sam format (using samtools). please see the doc at https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/bwa/mem.html

ADD REPLY
0
Entering edit mode

so to summarise I try to use these two following solutions: a. use threads: 16 and -j 3? b. try wrapper? Thank you for your help!

ADD REPLY
1
Entering edit mode
snakemake -j

Use at most N cores in parallel. If N is omitted or ‘all’, the limit is set to the number of available cores.

So, if you write snakemake -j 3 it will not be run in parallel, because you defined that your rule requires 16 threads.

Take a look at this : https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#step-1-specifying-the-number-of-used-threads

ADD REPLY
1
Entering edit mode

so -j should be more than n.of threads? Sorry, I feel totally confused now because from the comments above it seems vice versa!

ADD REPLY
0
Entering edit mode

so guys, I run one paired-end samples of 7 gbp vs 12 gbp reference using threads 16 and -j 16 since it is only 1 job should be fine. It took me nearly 5 hrs for completion. Does this sound good or is it possible to accelerate further? I don't know may be for these big files this is the time required...? thanks..

ADD REPLY
2
Entering edit mode
4.8 years ago
d-cameron ★ 2.9k

Sequence alignment is an embarrassingly parallel computational task. That is, performance of bwa scales (almost) linearly with the number of threads provided. If you have a 24 core machine, you gain no benefit from running 3 8-thread bwa tasks in parallel instead of just running a single 24 thread bwa instance. I recommend just running a single bwa that uses all available threads.

It is extremely slow

Sequence alignment is a computationally expensive task. My rule of thumb is one CPU hour per gigabyte of .fastq.gz input. If you're happy to run code not considered production-ready, then bwa mem 2 has many performance improvements that make it more than twice as fast as bwa mem.

That said, if you do not have sufficient memory to store the entire index in memory, the throughput will be 10-100x slower than normal since part of the index will be stored in swap.

against 2 reference genomes, each 12 Gbp

This means your bwa index will be approximately 24 gigabytes in size (if you've made a combined .fasta file containing both genomes). You'll need more than 24GiB of memory to run bwa. I'm not sure if bwa memory maps the index file so you may or may not need 24GiB per instance of bwa.

continues to give me core dumped error

Does your combined reference contain any chromosomes larger than 512Mb? If so, they are not compatible with the BAM specifications. This may be the cause of your core dumps. Split any reference contigs larger than 512Mb into multiple contigs.

bwa-0.7.12 samtools-1.3

I strongly recommend using the latest version of bwa and samtools.

ADD COMMENT

Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6