Hi!
I noticed a strange thing, I have been running a DNA-seq pipeline like this:
reads -> bwa-mem2 -> picard SortSam -> picard MergeSamFiles -> picard MarkDuplicates -> gatk LeftAlignIndels ...
gatk LeftAlignIndels has always taken around 4 hours to complete with the test reads I use here. But when i changed from picard to sambamba in the preceeding steps, now gatk LeftAlignIndels is suddenly completed after just 2 hours without any other changes.
How can that be? The workstation I run this on is not used by anyone elsa atm so it should not be due to that more resources was free when I noticed the drop in time.
Does anyone have an idea? Does sambamba do something that makes is easier to realign? I have no idea.
best/ Jonas
Im just guessing, but could it maybe be the case that
sambamba
uses a different compression level (-l option
) than picards MarkDuplicates? Have you compared the file sizes of the MarkDuplicated bam?Ah you might be right! that's probably right, I will check it