Sequence alignment is an embarrassingly parallel computational task. That is, performance of bwa scales (almost) linearly with the number of threads provided. If you have a 24 core machine, you gain no benefit from running 3 8-thread bwa tasks in parallel instead of just running a single 24 thread bwa instance. I recommend just running a single bwa that uses all available threads.
It is extremely slow
Sequence alignment is a computationally expensive task. My rule of thumb is one CPU hour per gigabyte of .fastq.gz input. If you're happy to run code not considered production-ready, then bwa mem 2 has many performance improvements that make it more than twice as fast as bwa mem.
That said, if you do not have sufficient memory to store the entire index in memory, the throughput will be 10-100x slower than normal since part of the index will be stored in swap.
against 2 reference genomes, each 12 Gbp
This means your bwa index will be approximately 24 gigabytes in size (if you've made a combined .fasta file containing both genomes). You'll need more than 24GiB of memory to run bwa. I'm not sure if bwa memory maps the index file so you may or may not need 24GiB per instance of bwa.
continues to give me core dumped error
Does your combined reference contain any chromosomes larger than 512Mb? If so, they are not compatible with the BAM specifications. This may be the cause of your core dumps. Split any reference contigs larger than 512Mb into multiple contigs.
bwa-0.7.12 samtools-1.3
I strongly recommend using the latest version of bwa and samtools.
For the human genomebwa
(in my experience) takes somewhat 5Gb for the index alone (do not remember exact number). Say 5Gb, times 24 jobs is already 192Gb for index loading alone. 8 threads per job require additional buffer to keep the reads and intermediate results in RAM so you probably run out of memory unless being on a big node that has a lot of memory. For our standard 192Gb nodes I typically run 4bwa
jobs in parallel with 16 threads each. It can be RAM-hungry at times especially when hitting reads that are difficult to align.By the way, the sorting requires additional 768Mb ram per job, so that is another like 20Gb for your script. Keep that in mind. How much memory is available?
so if I set threads to 16 and -j 4, wouldn't it run only one job at a time? According to this link: https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html Sorry, I feel a bit confused. I have around 200GB
Sorry, I was interpreting
-j
as parallel jobs as in GNU parallel. Not a snakemake user myself. Will move my answer to comment. Are you the only user of the node or are other jobs running on there?-j
does mean the amount of jobs to run in parallel right? https://snakemake.readthedocs.io/en/v3.13.2/executable.html I thought your answer was good and lowering-j
can solve the problem. You can also just try to map one or two samples manually at the same time to check if this error comes up again. Or change-j
to 1 or 2 and check if you still get the error.Hi gb thanks for the suggestion, I am mapping 1 paired-end sample from yesterday afternoon, I guess it is not normal it is still running? ;(
hmm thought bwa-mem is pretty fast, someone else need to confirm that. You can check the processes with the command
htop
ortop
.yeah I do... the good news is that it is running..the bad news is that it is still running....
at the moment I am the only user
Just noted you are using an ancient samtools version. Upgrading is recommended. Not sure if this syntax was already valid with version 1.3 (so auto-detecting file format).
Ah so you mean I probably cannot pass directly to samtools sort with this old version? I would love to use the newer version, but the previous analyses from my college was run with this version, so to make both analyses compatible I have to use this old version...
I don't think that is a good strategy with accessory programs like
samtools
. You should upgrade to the latest to benefit from bug fixes/ease of use.Never mind, just checked the documentation, should be fine.
If a previous analysis was done, why aren't you using the bam files from that analysis?
because I have different samples and different genomes.. anyway with threads 16 and -j 16 it takes 5 hrs for one samples to align.. don't know if this sounds like an acceptable timing..
Another solution that can eventually help with snakemake is to use snakemake wrappers that can do exactly what you want to do. It contains a wrapper handling bwa-mem, also the wrapper allows to output in .sam format (using samtools). please see the doc at https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/bwa/mem.html
so to summarise I try to use these two following solutions: a. use threads: 16 and -j 3? b. try wrapper? Thank you for your help!
So, if you write
snakemake -j 3
it will not be run in parallel, because you defined that your rule requires 16 threads.Take a look at this : https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html#step-1-specifying-the-number-of-used-threads
so -j should be more than n.of threads? Sorry, I feel totally confused now because from the comments above it seems vice versa!
so guys, I run one paired-end samples of 7 gbp vs 12 gbp reference using threads 16 and -j 16 since it is only 1 job should be fine. It took me nearly 5 hrs for completion. Does this sound good or is it possible to accelerate further? I don't know may be for these big files this is the time required...? thanks..