When trying to match snps to rs number based on position I came across this problem. There are multiple SNPs on the same position with the same alleles and they are not synonyms or merged into each other.
for example I have this SNV without rsnumber: 1:564886 (GRCh37) T C which https://www.ncbi.nlm.nih.gov/snp/?term=1%3A564886 comes up with 4 different rs numbers.
A quick scan shows that there are around 560k instances like this in dbSNP156_GRCh37.p13, with at least 2 SNPs having the same position and alleles. (50k instances for 3+ SNPs)
Is there a "correct way" to choose the rsnumber in this situation? Choose the lowest number (oldest entry) ?
please see: Why do the same genetic variants sometimes result in different amino acids in the REVEL table?. if you still have questions after reading through, let us know.
VAL
Thanks for that, quite insightful about these SNPs being in different transcripts. But in GWAS Summary Stats this extra info is never available. Based on position & alleles it could be any of them. How would I decide which rsnumber is the best match for my SNP?
you study the SNP, the locus, and the phenotype of interest. By looking at everything together, you try to make an educated guess as to what is happening, then you annotate using that understanding.
in general, people will default to the MANE select transcript isoform.
but, if you know for instance that you are studying epithelial cells and that there is a specific transcript isoform that is not the MANE select, then you go with that instead.
as an example, FGFR2 has one isoform in epithelial cells, another in mesenchymal, and that is VERY tightly controlled. So, if you were studying one or the other, that knowledge would supercede the consensus choice for that data.
theres no right answer, you just read everything you can about the locus, the SNV, and the phenotype and try to annotation according to that understanding.