I'm writing a small piece of software to determine which variants in a set of genomic variants are already known in dbSNP137. It's working correctly except sometimes I find that the observed allele in dbSNP reads "lengthTooLong" instead of spelling out the nucleotides individually. An example of one of these rsIDs is rs74196910 (here's a link to it on UCSC: http://genome.ucsc.edu/cgi-bin/hgc?hgsid=345960903&c=chr1&o=2212659&t=2212660&g=snp137&i=rs74196910)
Does anyone know where I could download a version of dbSNP137 which contains the nucleotides for rsIDs like this one? Although this is useful for the program I'm writing, it seems a bit odd to me in general that versions of the database would be released which don't contain the full variants..
I encountered the same exact problem a few months ago and I wasn't able to find a better table. The way I went around it, for deletions (since you have the coordinates of the deletion in the dnSNP file) I extracted the nucleotide sequence from the reference genome (hg19) for those coordinates. Mixed variants and in-dels I wasn't able to fix.
Also you might notice for those variants were there is a lengthTooLong notation in column 7 there is a nucleotide sequence at the end of the line. Ex.:
926 chr1 44808539 44808540 rs71579081 0 - lengthTooLong genomic in-del unknown 0.5 0 intron exact 1 ObservedTooLong 1 HUMANGENOME_JCVI, 2 -,AAAAAAAAATATATATATATATATATATATATATATATATTTAT, 1.000000,1.000000, 0.500000,0.500000,
not sure if that is the nucleotide sequence or not.