I have 2 pair-end fastq files, the size is 200GB * 2.
Doing "bwa mem" cost me nearly 6 hours on a pretty good machine(24 physical core E5-2670 v3, Hyper-Threading, 64GB memory).
There is some problem with the "-t" param . The timecost of "-t 48" and "-t 12" are nearly the same.
I wonder if it's possible to split fastq file into multi parts,and run "bwa mem" seperately and concurrently? Then combine the sam outputs toghter.
6 hours is pretty awesome in my experience for a file of that size (if the size is the compressed fq, is it?). At some point, increasing -t is probably not beneficial as I/O limitations kick in. You can of course split your fastq into pieces, then later use e.g. SAMtools cat piped into SAMtools sort, but that will in the end take probably longer than just waiting these 6h.
I am also curious about bwa mem mapping rate. Rather than the size of the file, I would specify the number of reads. I subsampled a 150bp paired fastq pair to 1 mil reads and mapped it to the zebrafish genome (half the size of human genome). I used a computing cluster, using 16 cores (each with 8 gb ram , but barely 8 gb was actually used). Mapping rate is also affected by read quality. I have trimmed reads with all bases having phred quality >28.
bwa mem mapping: 1 mil reads in 73 sec (just pure mapping).
mapping+samblaster+samtools fixmate+samtools sort (for variant calling workflow): 1 mil reads in 325 sec
For pure mapping, that is mapping around 13600 reads per second. So 200mil reads would take 4 hours.
For full workflow, mapping is 3000 reads per sec. So 200mil reads would take 18 hours.
(These are just rough estimated values. It is not a simple linear interpolation in practice.)
I wonder there are any sources to compare mapping rate (reads mapped per sec or min) on various system specs.
If you can't increase speed by using more than 12 threads on a 24-core node, you are probably I/O limited, in which case no alternative aligner could run faster (unless you are write-limited due to unnecessary fields in the sam output, which you could potentially disable). You can prevent such I/O limitations by keeping your files compressed at all times (for example, gzipped via pigz), and read/write compressed files at every stage of your pipeline. If you run "top" while mapping is running, you will see how much CPU utilization you have; it should be around 4800% while mapping with 48 threads. Hyperthreading does not particularly increase the speed of mapping, though; it's more for floating-point operations, so there is likely no point in exceeding 24 threads anyway.
If you have multiple disks or filesystems, you may be able to increase speed by reading from one disk and writing to another.
6 hours is pretty awesome in my experience for a file of that size (if the size is the compressed fq, is it?). At some point, increasing -t is probably not beneficial as I/O limitations kick in. You can of course split your fastq into pieces, then later use e.g. SAMtools cat piped into SAMtools sort, but that will in the end take probably longer than just waiting these 6h.
200GB*2 is plain fastq, not compressed.
thank you for your answer, I'll try to split it, and figure out the timecost ^_^
Try something other than
bwa
?minimap2
was released recently and is supposed to be incredibly fast.I am also curious about bwa mem mapping rate. Rather than the size of the file, I would specify the number of reads. I subsampled a 150bp paired fastq pair to 1 mil reads and mapped it to the zebrafish genome (half the size of human genome). I used a computing cluster, using 16 cores (each with 8 gb ram , but barely 8 gb was actually used). Mapping rate is also affected by read quality. I have trimmed reads with all bases having phred quality >28.
bwa mem mapping: 1 mil reads in 73 sec (just pure mapping). mapping+samblaster+samtools fixmate+samtools sort (for variant calling workflow): 1 mil reads in 325 sec
For pure mapping, that is mapping around 13600 reads per second. So 200mil reads would take 4 hours. For full workflow, mapping is 3000 reads per sec. So 200mil reads would take 18 hours. (These are just rough estimated values. It is not a simple linear interpolation in practice.)
I wonder there are any sources to compare mapping rate (reads mapped per sec or min) on various system specs.