Hi everyone,
I am trying to call for structural variants (using svim
) in my PacBio long reads sequencing dataset. However, I noticed that I get a vastly different number of variants (100,000 vs 1,000) when I used a bam
alignment (from ngmlr
) directly converted with samtools
versus one that I processed with samtools fixmate
and samtools markdup
prior (significantly less in the latter). (Workflow: https://www.htslib.org/workflow/fastq.html)
Is this normal? And are these steps necessary for this specific use case (SV calling)? (Frankly I do not quite understand what impact the samtools fixmate
or samtools markdup
might be)
Thank you very much for your help.
[Edited for clarity]
Best regards,
ZH