Question

Is is a possible mosaic / effect of downsampling?

0

Entering edit mode

7.1 years ago

Sharon ▴ 610

Hi Everyone

I am trying to check if this low freq variant is ineresting, if it is also mosaic.

AC=1;AF=0.5;AN=2;DP=910; 0/1:99:910:0.095:99:549,0,17114:824,86  and QD=0.57;SF=8f.

The region around it has similar coverage, some snps is low frequent AD like 2/910 or 6/906 or 0/910, this is what I see from IGV. It is in GC percentage 60% which is high GC.

What I can't understand is this black bar which seems to be a downsampling. Could this downsampling be the reason behind the low the ALT allele counts?
Hoovering over says in this interval [-] 356 reads has been removed.

My goal is to be sure whether to discard this variant or accept it as possible mosaic. I don't also if there is another meaning for the orange area rather than it is the region targeted by downsampling? Image here:

https://ibb.co/fraAyS

Thanks

exome sequencing VCF Downsampling • 1.5k views

ADD COMMENT • link 7.1 years ago by Sharon ▴ 610

1

Entering edit mode

Hi Sharon, good to see you again. It's difficult to answer. Is mosaicism expected in this case? Looking at the VCF data, it looks like the A allele is actually ~10% of the reads, but I cannot see your other fields.

Yes, due to downsampling, you may not see many of the reads with the A allele. To view all of the reads and not just the downsampled ones, you can do: View --> Preferences --> Alignments and then change the maximum coverage depth

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Thanks Kevin. What if the downsampling is biased, like I think in repeating without downsampling and see if the counts change?

ADD REPLY • link 7.1 years ago by Sharon ▴ 610

1

Entering edit mode

Hi Sharon, that's an interesting view to take and I believe that you are correct (i.e., the downsampling is biased), As far as I know, when a variant caller is looking over each position to determine if a variant is present, they just take reads sequentially, and when they reach a certain number of reads, they stop looking further.

This directly relates to the finding that my colleagues and I made in a children's hospital in the UK whereby we split our BAM files into 4 different files, representing 100%, 75%, 50%, and 25% 'random' reads. In many situations, a Sanger-confirmed variant was observed in one of the lower read subsets and missed in the full (100%) set. This is possibly related to downsampling. That pipeline and methodology is on my GitHub page: https://github.com/kevinblighe/ClinicalGradeDNAseq

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

1

Entering edit mode

Great Kevin. Thanks so much, very much appreciated. Many thanks for also sharing this pipeline with me, will go through. Thanks :)

ADD REPLY • link 7.1 years ago by Sharon ▴ 610