Hello all,
I am analyzing the variant call analysis from my identical twins. (they are confirmed as having a same disease-causing mutation, which was a mendelian disease) .. However, their phenotypes are very different. So, I would like to identify the variants, which are unique for each of twins.
I got vcf file from miSeq buildin tools and have used snpEff and snpSift to call and identify the variants.
Finally I got about 80 each of twins unique variants. ( I only manually filtered out with option read-depth 20)
Then, interestingly, many of identifed varians are annotated as "LowVariantFreq/SB" or "SB".
In this situation, I am wondering whether I need to remove all variants annotated as these one filtering options or I can keep and pay attention to those one.. Since if I remove all those variants annotated with LowVariantFreq/SB or SB filtering, I would only have very small fraction of variants. I am a little worried that I might miss the True variants.
(Note that I used the default filtering option for these filtering annotations)
In addition, Some of called variants are mapping to known SNP(having rs_id). I have heard that we could not just throw away those variants since such as 1000 human genome snp has al lot of bias, we could not just assume that the variants having rs_id is really SNP...
Could you please someone give comments?
If one identical twin has a known SNP, the other one will probably have it too, if you look harder (like Sanger sequencing the locus); that's much more likely than it arising from chance. I would ignore them, at least for a first pass.
The phenotypic differences are not necessarily due to genetic differences. Could be epigenetic, or due to gut bacteria, for example. For identical twins I would assume differences are NOT genetic unless I had a reason to believe otherwise...