Bwa on multiple processor
0
0
Entering edit mode
2.9 years ago

Hi Guys,

When I am trying to run bwa mem on multiple processor, I am getting error as :

> mpirun -np 16 bwa mem hg19-agilent.fasta R1.fastq  R2.fastq  | samtools sort  -o aln.bam

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 81100 sequences (10000119 bp)...
[M::process] read 81100 sequences (10000119 bp)...
[M::process] read 81100 sequences (10000119 bp)...
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 81550 sequences (10000056 bp)...
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 81550 sequences (10000056 bp)...
[M::process] read 81550 sequences (10000056 bp)...
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 81100 sequences (10000119 bp)...
[M::process] read 81550 sequences (10000056 bp)...
[M::process] read 81100 sequences (10000119 bp)...
[M::process] read 81100 sequences (10000119 bp)...
[M::process] read 81100 sequences (10000119 bp)...
[M::process] read 81100 sequences (10000119 bp)...
[M::process] read 81550 sequences (10000056 bp)...
[M::process] read 81550 sequences (10000056 bp)...
[M::process] read 81550 sequences (10000056 bp)...
[M::process] read 81550 sequences (10000056 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (16, 30319, 18, 12)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (143, 297, 517)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1265)
[M::mem_pestat] mean and std.dev: (327.33, 289.63)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1639)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (98, 126, 162)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 290)
[M::mem_pestat] mean and std.dev: (132.56, 47.68)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 354)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (50, 128, 341)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 923)
[M::mem_pestat] mean and std.dev: (184.25, 219.99)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1214)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (148, 176, 1975)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5629)
[M::mem_pestat] mean and std.dev: (759.27, 1220.33)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 7456)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (16, 30319, 18, 12)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (143, 297, 517)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1265)
[M::mem_pestat] mean and std.dev: (327.33, 289.63)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1639)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (98, 126, 162)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 290)
[M::mem_pestat] mean and std.dev: (132.56, 47.68)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 354)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (50, 128, 341)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 923)
[M::mem_pestat] mean and std.dev: (184.25, 219.99)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1214)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (148, 176, 1975)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5629)
[M::mem_pestat] mean and std.dev: (759.27, 1220.33)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 7456)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 81100 reads in 31.031 CPU sec, 30.964 real sec
[M::mem_process_seqs] Processed 81100 reads in 31.117 CPU sec, 31.004 real sec
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 27849
samtools sort: truncated file. Aborting

It seems that [at least] one of the processes that was started with mpirun did not invoke MPI_INIT before quitting (it is possible that more than one process did not invoke MPI_INIT -- mpirun was only notified of the first one, which was on node n0).

mpirun can only be used with MPI programs (i.e., programs that invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program to run non-MPI programs over the lambooted nodes.

While when I am running it as below, it gets completed successfully.

bwa mem hg19-agilent.fasta R1.fastq R2.fastq | samtools sort -o aln.bam

Is there any issue running with multiple processor? Please suggest.

Thanks and regards

np mpirun bwa alignment • 3.3k views
ADD COMMENT
1
Entering edit mode

I don't think bwa is compatible with mpi. 'just' use:

 bwa mem -t 16 ....
ADD REPLY
0
Entering edit mode

Hi,

Thanks for your reply. When I run bwa mem - t 16, it is only using 15-20% of memory as with bwa mem.

ADD REPLY
0
Entering edit mode

You do not have to worry about memory use. Program will use as much as it needs and that may change over time. As long as the job is running using all 16 cores you should simply monitor.

ADD REPLY
1
Entering edit mode

To be clear, mpi is a very specific kind of parallel processing implementation which your software would need to be written and compiled accordingly to allow. Nowhere in the BWA documentation does it indicate using mpirun to execute bwa in parallel.

ADD REPLY
0
Entering edit mode

Okay. Then how to run bwa on multiple processor?

ADD REPLY
0
Entering edit mode

As @Pierre showed above.

ADD REPLY
0
Entering edit mode

Hi. Thanks but it is not working.

ADD REPLY
0
Entering edit mode

I believe the -t option refers to multithreading, not parallelizing across multiple processors.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I apologize for this. I meant that it is using 15-20% of memory but I want to make maximum usage of memory so that it runs faster. I have 50 samples to run.

ADD REPLY
0
Entering edit mode

what GenoMax said.

I have 50 samples to run.

run your samples in parallel.

ADD REPLY
0
Entering edit mode

A job is not going to run faster by using more memory. Speed in this case comes from use of multiple cores. Assuming you are working on a single server/computer (and not a compute cluster with a job scheduler) there is a maximum number of cores you have at your disposal. You could use up to the max number of cores for one job but that is as fast as a job is going to go (other things like speed of disk storage will come into play and will likely be the limiting factor).

You can either start more than one job at the same time (so the cores used add up to that max number) or run one job with max cores at one time until you go through all 50.

ADD REPLY
0
Entering edit mode

Thank for your reply Genomax. Yes you are right. I am working on a single computer. As you said I want to run one job in for loop traversing over all 50 samples. I am using -t 32 option as Pierre suggested. But it is still running on the same pace as without -t.

Can you please guide me how to run one job with max cores or run all samples in parallel?

ADD REPLY
0
Entering edit mode

or run all samples in parallel?

use a workflow manager like nextflow or snakemake.

ADD REPLY
0
Entering edit mode

But it is still running on the same pace as without -t.

That is not possible. If you take a look at the job via top or htop you should see all cores being used. There may be times when not all would be in use but most of the times they would be. Alignments are process intensive and can take a while running against a human genome size target. You probably have millions of reads per sample so there is that as well. It can take several hours (with multiple threads) to complete one sample alignment.

There are plenty of threads here that show you how to process a set of samples using a for loop. Here is one example: Running BWA mem in a for loop

ADD REPLY
0
Entering edit mode

Hi Genomax,

I am attaching herewith screenshots of top and activity monitor. Please have a look and let me know whether multiple cores are being used. I am not able to figure out. enter image description here enter image description here

ADD REPLY
0
Entering edit mode

Yes they are. In top display you can see that 2850% CPU usage for bwa that is indicative of over 29 cores in use (that % will change over time) so all 32 cores are in use. You can see that in the second graph you posted which shows 100% usage on all cores.

ADD REPLY
0
Entering edit mode

Thank you so much guys, you made my day. The job seems to be completed in less than 24 hrs.

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6