Question

Why do my 'high-quality' variants look like artifacts?

0

Entering edit mode

2.8 years ago

Timotheus ▴ 40

Hello,

I have short reads from one non-model genome mapped against a very closely related genome assembly and want to examine variantion. I did standard variant calling, filtering (for reasonable depth, QUAL>50, MQ>55). How is it possibe that variants that passed all those filters look that bad (see the screenshot)?? Those are clearly mismapping reads (and artifactual variants). Why does the MQ filter fail? This is driving me crazy, I'd appreciate any suggestions. enter image description here

SNPs vcf alignment variants • 1.0k views

ADD COMMENT • link 2.8 years ago by Timotheus ▴ 40

score 2 · Answer 1 · 2022-10-27

The mapping quality of the read you highlighted is 60 (bottom of screenshot) and I don't see visual differences between the reads. So the MQ filter is "failing" because, despite the many mismatches, those are high-quality alignments. My best guess is that, since you've aligned to a closely-related but not identical reference, that the multiple-mismatching reads are from sequence not present in the reference to which you aligned. Also, to my eye, the non-reference allele appears to be occurring on both reads with no other mismatches and reads with lots of mismatches, so that specific variant is "good".

You could perform a pre-pass eliminating reads with edit distance (NM) > 10 (or 8 or whatever). Or, with 118x coverage, you could pass your reads through an assembler and align whole contigs, identify those that seem not to be present in the reference, and include them as "decoys" to mitigate these mappings.