Hi Everyone
I am trying to check if this low freq variant is ineresting, if it is also mosaic.
AC=1;AF=0.5;AN=2;DP=910; 0/1:99:910:0.095:99:549,0,17114:824,86 and QD=0.57;SF=8f.
The region around it has similar coverage, some snps is low frequent AD like 2/910 or 6/906 or 0/910,
this is what I see from IGV.
It is in GC percentage 60%
which is high GC.
What I can't understand is this black bar which seems to be a downsampling.
Could this downsampling be the reason behind the low the ALT allele counts?
Hoovering over says in this interval [-] 356 reads has been removed.
My goal is to be sure whether to discard this variant or accept it as possible mosaic. I don't also if there is another meaning for the orange area rather than it is the region targeted by downsampling? Image here:
Thanks
Hi Sharon, good to see you again. It's difficult to answer. Is mosaicism expected in this case? Looking at the VCF data, it looks like the A allele is actually ~10% of the reads, but I cannot see your other fields.
Yes, due to downsampling, you may not see many of the reads with the A allele. To view all of the reads and not just the downsampled ones, you can do:
View
-->Preferences
-->Alignments
and then change the maximum coverage depthThanks Kevin. What if the downsampling is biased, like I think in repeating without downsampling and see if the counts change?
Hi Sharon, that's an interesting view to take and I believe that you are correct (i.e., the downsampling is biased), As far as I know, when a variant caller is looking over each position to determine if a variant is present, they just take reads sequentially, and when they reach a certain number of reads, they stop looking further.
This directly relates to the finding that my colleagues and I made in a children's hospital in the UK whereby we split our BAM files into 4 different files, representing 100%, 75%, 50%, and 25% 'random' reads. In many situations, a Sanger-confirmed variant was observed in one of the lower read subsets and missed in the full (100%) set. This is possibly related to downsampling. That pipeline and methodology is on my GitHub page: https://github.com/kevinblighe/ClinicalGradeDNAseq
Great Kevin. Thanks so much, very much appreciated. Many thanks for also sharing this pipeline with me, will go through. Thanks :)