Variant with AD 14,3 is heterozygote?
1
2
Entering edit mode
6.7 years ago
Sharon ▴ 610

Hi everyone

I would like to take a second opinion regarding a variant with DP =17. The variant is in a very high GC content area so the low coverage is reasonable.

I do hard filtering because I have few samples for VQSR, I thinks it is okay according to GATK hard filtering, the values are as follows:

   AC=1;AF=0.500;AN=2;BaseQRankSum=0.892;ClippingRankSum=0.000;DP=17;ExcessHet=3.0103;FS=10.843;
    MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=2.81;ReadPosRankSum=-1.534;
SOR=2.795       GT:AD:DP:GQ:PL  0/1:14,3:17:76:76,0,506

I am worried because AD is 14,3 which means only 3 reads only support the ALT? is it safe to accept it as a het variant. The variant is very interesting based on our data, so I don't wanna discard without a strong reason.

Any comments, how do you think? with many many thanks

Exome VCF • 1.9k views
ADD COMMENT
2
Entering edit mode
6.7 years ago
Gabriel R. ★ 2.9k

I cannot provide a detailed answer given the information but a few comments.

There are 2 most likely models: 1) The site is homozygous reference

Therefore, the 3 reads are potentially a) mismapped, b) weird duplicates or c) sequencing errors.

a) Did you use a mappability filter? i recommend Heng Li's method lh3lh3.users.sourceforge.net/snpable.shtml b) Did you use rmdup? c) if both a) and b) are satisfied, then they are sequencing errors, assuming a base quality of 38, the probability of 3 seq. errors occurring independently is:

(10^((-1*38)/10))^3 = 3.981072e-12

so one chance in 250 billion.

2) The site is heterozygous reference

It is possible that this is a genuine het site and you haven't sampled the other base, this can happen with probability:

dbinom(3,17,prob=0.5) = 0.005187988

so 1 chance in ~200 (assuming no mismappings or errors).

This is reflected in your PL field where the homo ref model has a score of 76, the het model a score of 0 (most likely) and 506 for the homo alt. While the hetero is more likely that the homo ref, it is not astronomically more likely. However, the homo alt is so unlikely that it can be safely discarded.

ADD COMMENT
1
Entering edit mode

Nice answer! I wonder if the low number of ALT reads could be because ALT reads have a mismatch with the reference. This mismatch could make some reads to map equally well to multiple locations hence their MAPQ is 0 hence they are filtered out somewhere in the analysis.

Maybe useful to inspect the unfiltered bam files to look for the reads mapped at the incriminated locus.

ADD REPLY
1
Entering edit mode

Thanks Gabriel for the nice answer. I did not use a nor b, I will give them a try. Just used GATK pipeline. Thanks dariober, that's another good point I did not think of, I will check other reads . Thanks

ADD REPLY
1
Entering edit mode

good comment. my personal experience with this is if I have to choose between filtering by mapping quality or mappability, pick mappability, it gives greater noise reduction: see figure on page 46 https://media.nature.com/original/nature-assets/nature/journal/v505/n7481/extref/nature12886-s1.pdf

ADD REPLY
0
Entering edit mode

Interesting figure. Never thought about this. So I check if the reads are mappable, if yes, that's a sign of less likely for errors, which means trust the call and trust the variant? Do I understand right?

ADD REPLY
1
Entering edit mode

The reads are not mappable, the genomic region is (or isn't). Mapping quality is defined on a per read basis but mappability is defined per genomic locus. Both aim at reducing the probability that there is a CNV or repeat or a read so badly damaged due to sequencing errors that it could end up in multiple places or a read so short that could have come from multiple genomic loci.

If the genomic region is highly mappable and you have good mapping quality, it likely means that you can discard the "read is mismapped" possibility. now, it is still possible to have sequencing errors or bizarre strand effects due to Illumina even if you are correctly mapped.

ADD REPLY
1
Entering edit mode

Great, got it now. Thanks so much for the clear explanation. Much appreciated Gabriel :)

ADD REPLY

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6