Is there a painless way of compiling/estimating the number of SNPs per human gene, while excluding disease-associates SNPs ? Ideally, one would also compensate for observation bias (e.g. particular genes have been sequenced a gazillion times while others are only covered in genome-wide searches) ?
The idea is not to know every single SNP ever observed, but rather to get an estimate of the 'degree of polymorphism' of particular genes. I have tried to download dbSNP, but there you have all kinds of disease-causing mutations, and there is a strong over-emphasis on important genes such as TP53 or ATM. What would work are e.g. data coming from genome-wide SNP calling of diverse but healthy population. Or a way to filter out disease-related SNPs and somehow normalized for different gene coverage.
ExAc (http://exac.broadinstitute.org) looks interesting, but if they have such data, I can't find them.
Any help would be appreciated!!
This could be a difficult question as the number of known disease causing SNPs are limited by as disease variant list is not comprehensive. Following steps may be useful: