Hi Everyone
I have an interesting variant (very rare) found in affected and not found in unaffected and non-carriers. The variant is in a gene highly related to a very rare phenotype.
The variant is annotated by annovar as GnomAD = 0, and ExAC = .
. I looked it in ExAC , it is not found.
However when I look it up in gnomAD it is found as filtered variant as AC0
, so not high confidence genotype.
As follows:
Filtered RF and AC0
Allele Count
0
Allele Number
239250
Allele Frequency
0
In my samples the variant is found with DP between 15 and 21 in my samples and all the other scores like FS and MPQ, ..etc seem okay. As follows:
GT:AD:DP:GQ:PL 0/1:7,14:21:99:383,0,13
MQ=58.73;MQRankSum=0.871;QD=13.15;ReadPosRankSum=-0.989;SOR=1.630;
ABHet=0.505;ABHom=1.00;AC=1;AF=0.00;AN=2;BaseQRankSum=1.875;DP=1987;Dels=0.00;FS=3.500
Should I take the variant into account or probably assume it is an artifact. Or it could be interesting but in a hard region, so not found in ExAC At all, and found in gnomAD with low confidence? Thanks
did you look at the bam file ?
No, what should I find to assure or not?
In the BAM alignments (on IGV), you can gauge the general alignment depth of coverage around the variant, whether or not any of these alignments have low MAPQ or mis-matched mates, and whether it's being called in a repeat region, for example.
So, if it is in a repeat region, or everything around it has high coverage, then discard? if everything around has low coverage then probably it is a hard region, so take into account? Is this right? Thanks :)
A repeat region just makes it more difficult for the aligner to correctly position reads, which should reduce their MAPQ and, ultimately, the reliability of the variant call. Also, if it's a region that exhibits high sequence similarity to another region (or regions) of the genome, then this can also bias the variant call.
There are many genomic regions that remain difficult, if not impossible, to reliably sequence with current short-read technology.
I'm guessing that your region is indeed a repeat of some sort? If you look it up on UCSC Genome Browser, you can see if it's repeat-masked or not.
Thanks Kevin so much. I can't tell, if it is repeat or not? And if it is for somereason hard and unreliable call, we should discard or keep? here it is: https://ibb.co/j2aJYS
Hmmm... all of the bases are N masked but that UCSC session is for hg38. Are your variant positions hg38 or hg19?
Oh, no. I added one for hg19: https://ibb.co/b1gLxn
If your variant is the middle base in that screenshot, i.e., C, then it's on a highly conserved base. High conservation is the best single predictor of pathogenicity.
Aside from that, the broader region appears to have a few short repeats of G and generally a high GC content, which would affect DNA melting and coverage over the region, but not necessarily affect the alignment that much. Here's my current UCSC session: https://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVi...
If you could additionally look at the alignments in IGV, that would help. If they are all flagged with a particular colour, then alarm bells should ring.
Thanks Kevin so much. You always teach me something new. Much appreciated.
Thanks Pierre so much
I think that Pierre became busy! I was twiddling my thumbs...