Hi,
I am working on SNPs data stored in dbSNP, but I have some doubts about SNPs validated as unknown.
The term "unknown" is referred to all those SNPs found only one time and they could be effective snp or just a consequence of sequencing error.
For most of them, even if they are validated as known, there are several submitters (both research labs and consortia) and the allele frequencies.
How do I consider these SNPs? Is it right included them in an experiment?
Thank you.
I think the question is, do you have the capacity to include them? In my opinion those SNPs would almost certainly be sequencing errors, but many could be rare and thus interesting to your analysis. If you are able to accommodate them then you should probably include them.
Depends on your application I guess, if you are including them with something like the GATK variant recalibration module step as a known dataset, you could probably reduce the confidence level of the unknown SNPs and have that reflect in your analysis.
If you are otherwise using them to calculate concordance against a different SNP set, you could likely exclude them.