Question

Best approach for parallelizing bwa mem over multiple CPUs

0

Entering edit mode

5.8 years ago

olikidrod • 0

I would like to parallelize bwa mem on multiple cores across multiple CPUs, on our high-performance computing cluster. This was previously discussed seven years ago in this thread: Parallelizing Bwa On Multiple Cpus. With that in mind, I'm now considering that the best way to do this is either:

1) Use Parallel BWA (pBWA).

2) Split the large input fastq files into multiple smaller fastq files, map each using its own instance of bwa mem on its own core, then merge them all together.

However, pBWA has not ben updated since October 2012. Further, I'd prefer option 2, as our cluster restricts the size of input file sizes. That said, as I'm using paired-end data (with two input fastq files), I'm not sure how best to split those up and ensure that reads in all the resulting files are still paired.

Does anyone have any insight on this, and -- given the time that's elapsed -- might there now be a better approach?

Thanks!

bwa hpc parallel • 4.7k views

ADD COMMENT • link updated 5.8 years ago by GenoMax 147k • written 5.8 years ago by olikidrod • 0

score 5 · Accepted Answer · 2019-01-25

5

Entering edit mode

5.8 years ago

GenoMax 147k

Use latest plain bwa. bwa can use multiple cores. So make sure you are using the option -t INT Number of threads.
You would want to keep individual job threads on a single physical server so as not to cause cross-talk. across node interconnects.
You always map paired-end files together in a single job so that is not an issue. You could split the files up into smaller chunks as demonstrated here if you wanted to brute-force parallelize your jobs: A: Can BWA restart a calculation after a break? Make sure your files stay in sync across R1/R2 reads since aligners don't check for that.
Don't uncompress your fastq files and use pipes to save on disk space ( C: How to combine two .sam files? )

ADD COMMENT • link 5.8 years ago by GenoMax 147k

0

Entering edit mode

Thanks for this. My understanding is that -t only refers to number of threads, whereas I need to parallelize over multiple physical servers and CPUs. Thus, I decided to go with brute-forcing as suggested in '3.'

For anyone else who stumbles across this, I chose to used bbmap to verify that paired FASTQ files (downloaded from published datasets) are properly paired. I then used Trimmomatic to remove low quality reads.

I then split the resulting paired files (in which both reads survived the quality check), also as described in 3., before mapping.

ADD REPLY • link 5.8 years ago by olikidrod • 0

1

Entering edit mode

Just a suggestion. If you were using bbmap then you could have done everything in that suite. reformat.sh to split (not really needed but if you wish), bbduk.sh to scan/trim and bbmap.sh to align. Both programs are multi-threaded and can use all cores you can afford to throw at them.

ADD REPLY • link 5.8 years ago by GenoMax 147k