I am trying to filter out the paralogs from alleles. Generally MAPQ values and/or coverage values at a locus gives an estimate of paralogous alignment. Other than than one needs to rely on the what is being seeing in the alignment.
In the given example (screenshot), I am little doubful with what I am seeing.
In the following IGV screen shot there are two biological samples, first one is deduped_MA605.bam (and two filtered version of it..which isn't important though) and the sample deduped_Sp164.bam. These are samples from two different populations which diverged about 10-35 K years ago. The genome data is aligned to reference genome and the screenshot (with observed variants) belong to an exome of a gene.
- Please open the image in new tab to view it properly.
Note:
The sample MA605 has only 1 allele variant in that window, while Sp164 sample has way more variant (8 SNPs) with in 100 bp frame, which is more than expected.
The coverage at this locus was decent (not to high from expected) for both the samples
The mapping quality is at 60 for both the samples.
Could the observed vaiant be a paralog? I know it will need help from other sequences but I would like to hear opinion from people on what they think? and why they think it is a real variant vs. paralog?
Thanks,