Entering edit mode
11.2 years ago
William
★
5.3k
Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with BAM files. Current functionality is an important subset of samtools functionality. Because of efficient use of modern multicore CPUs, usually Sambamba is much faster than samtools. For example, indexing an 18 Gb BAM file on a fast 8 core machine utilizes all cores at 45% CPU:
Sambamba index bam:
time ~/sambamba index /scratch/HG00119.mapped.ILLUMINA.bwa.GBR.exome.20111114.bam
real 1m42.930s
user 6m19.964s
sys 0m32.362s
Samtools index bam:
time ~/samtools index /scratch/HG00119.mapped.ILLUMINA.bwa.GBR.exome.20111114.bam
real 5m37.669s
user 5m9.127s
sys 0m13.605s
How many threads were used for the sambamba time?
I would also like to know how many concurrent threads were used, but assuming only the userspace code was multithreaded we can do
(real - sys) / user
which is approximately 5. If the ~45% utilization figure is correct, then5 * 1.55 = 7.75
, so approximately 8 threads.Completed the quote with the thread info.
Isn't disk IO the main bottleneck in this operation?
I guess that depends on the storage setup used. The faster the storage you use, the more the speedup is (see results for indexing). https://github.com/lomereiter/sambamba/wiki/Comparison-with-samtools
How to install and use it correctly? I have: