Hello
I am most perplxed by dbSNP I was hoping someone familiar with the schema of dbSNP might be able to help me. I have 2 issues and I was hoping you would address them individually. If you are short of time, please address issue 1 as this is my priority.
Issue 1
I have been investigating the contents of dbSNP directly by querying the database directly and it looks to me as though dbSNP doesn't store ALL the information about the effects of SNPs on transcript variants of a gene that it displays on its webpage and in its reports though it does store SOME
Take this example snp: rs2034920. You can see clearly from the web page that this snp affects 2 transcript variants using the GRCH37 assembly. These are: NM001007551 (versions 2 and 3) and NM001172288
If you look in the dbSNP database directly for the consequences of this SNP you only get one mRNA/protein in relation to the grch assembly which is NM001007551.2 (version 2). You don't get any details about NM001007551.3 or NM_001172288 but they are quite clearly displayed on the webpage. The query i ran was
SELECT * FROM b131SNPContigLocusId371 where snpid=2034920
You can see the results on line at this webpage which lets you query dbSNP schema directly: http://cgsmd.isi.edu/dbsnpq/submit.php?query=SELECT+*+FROM+b131_SNPContigLocusId_37_1#dbSNPstatusMessageBox
b131SNPContigLocusId37_1 is the table that stores the effects of a SNP on a transcript and you would expect to see at least one row in this table per transcript (and multiple rows per transcript if the different alleles had different consequences e.g. a C/T SNP only has one consequence in an intron [INTRONIC] regardless of the allele but a C and T could produce different amino acids in an exon so would have one row per allele in this table)
This query only returns the affects on one transcript? So where are the other transcripts in the database. I thought perhaps they might be linked by some relationship i don't know. However, if you query SELECT * FROM b131SNPContigLocusId371 where mrnaacc="NM_001172288" you get an empty set???? So that transcript doesn't appear to be in the database at all.
Issue 2
you will also see from these results [SELECT * FROM b131SNPContigLocusId371 where snpid=2034920] the same SNP is linked to different genes in the different assemblies.
Assembly: contig id, mrna id, protein id
Huref: NW001842404, XM002346337,XP002346378
Celera: NW927722, XM001716002, XP001716054
GRCH37: NT011786 NM001007551 NP_001007552
Why would a snp be linked to different genes in different assemblies? Hopefully this is simpler than the previous problem. I think the explanation in this case is that there is a family of very related genes on this chromosome and the SNP has been mapped to different members of this family in the different assemblies
My most grateful thanks for any light you can shed on this matter
For the versioning, data changes from release to release so you want to maintain a consistent release. Sorry to not be able to help with the database; your query seems correct but I don't have a local copy of 132 to compare. For question 2, I tried to edit the answer to be more clear: in summary, if you have different assemblies you also have different gene predictions.
hi, i've seen the reports and don't want to use them. i specifically need to use the local database. The dbsnp version is not the issue as the print out from the webpage for this record that I have is actually from build 131. Regarding question 2, that wasn't what I was asking. I was wondering how the snp ended up being mapped to 3 different genes in the 3 different assemblies. Naturally i would only use one assembly for consistency, I was wondering for personal interest.
i can't see any schema changes for dbsnp 132 on the ncbi list of schema changes? What are the changes in the new release that you refer to please? Are you suggesting they had added more data rather than actually changing the schema? That is a possibility, but i have been assured thaat the website showed the same data for this record a few months ago during build 131. http://www.ncbi.nlm.nih.gov/projects/SNP/snp_schemaChange.htm
I shall investigate your suggestion further to be sure but as far as I know the dbSNP website doesn't let you look at archive versions like ensembl does
I will download the database dumps and scan then
thanks for your edit. You were right about the database versions! dbSNP seem to have added a lot more info for build 132. You've saved me hours of hunting. I wouldn't have double checked the versions otherwise