Hi,
I was wondering if anyone can let me know the difference between 'Private' SNPs and 'Singleton' SNPs in a VCF file. For example, if I have a multisample VCF file of 20 samples, containing samples from different populations/families..say family A and family B each consisting of 10 individuals each. Using this example, how can I define the above two words?
I am aware of ways to get them, but due to lack of documentation, I am unable to wrap my head around these terms.
Are 'singletons' used for an individual whereas 'Private' for a family? If yes, How are Private SNPs assigned to VCF file (as in the methodology)?
If I am interpreting this correctly, private SNPs are SNPs private in regards to a population, or in your case, a family. Singleton SNPs would be SNPs that only show up once in a single individual. Hope this helps.
Thanks Giovanni. Do you know how are Private variants found or how they are assigned to a population in a VCF file? I know that Snpsift has a tool called 'private' that does that. But do you know what methodology does it follow to achieve that?
I would assume the private alleles or SNPs were found during the clustering step of your analysis. I would need to know how you generated your VCF to give you a more accurate answer.
I used the GATK germline variant calling pipeline to generate my VCF file. Is any more information needed?
A list of GATK's methods and algorithms can be found here. I hope this helps.