Problem Understanding Dbsnp Results On Webpage
3
4
Entering edit mode
14.0 years ago
Andrea_Bio ★ 2.8k

Hello

I don't understand the results returned by dbSNP. I have read their user manual and whilst it explains some things very well, it seems very sparse in other areas. I was hoping that if I took a sample record in dbSNP and pointed out some things I don't understand I was hoping the helpful people in this forum might explain them to me.

I was looking at this record: rs2034920

This maps to genome build GRCH37. It has the allele C/T

The reference sequence has base A at this position on the positive strand. Where in the record does it tell me that this SNP is described wrt the negative strand? I can't see it anywhere.

There is a table listing the SNP in different assemblies and there is a column called SNP to Chr which has values +/-. What does this column and its values mean? There are similar columns like 'SNP to mRNA'

If i get the full details for the record and look at the submission ss2944057 this is described as having orient positive but the allele on the webpage is C/T (reverse strand). Also the submission ss8529251 is described as orient negative but the allele on the webpage is A/G (forward strand). I don't understand this?

I also don't understand the column ss to 'rs Orientation/Strand' for each submission. I have read the link from the column heading but don't really follow it.

So many other things i don't understand too. Geneview only lists one gene whereas the snp affects 3 genes.

I could list so many more.....I can open a second question about how to interpret the results returned from eutils!

thanks a lot for your help

dbsnp ncbi • 7.9k views
ADD COMMENT
5
Entering edit mode
14.0 years ago
Neilfws 49k

There is a lot of information on a dbSNP page; it can be somewhat overwhelming.

I think the key thing to bear in mind is the allele. If it's e.g. C/T and you see references elsewhere to A/G, then you know that there is some "strand issue" for you to resolve, by examining whatever element (SNP, transcript, contig) is being described.

Your second question answers your first. "SNP to Chr" (-) tells you that the SNP is (-) with respect to chromosome, "SNP to mRNA" (+) tells you that it is (+) with respect to transcript. So, as Brad says, the transcript is on the (-) strand.

The "ss" records are just the individual sequences used to compile the "rs" cluster. I would not pay too much attention to "ss to rs orientation/strand". It is just a convention that Illumina have adopted to describe their sequences, in the absence of a chromosomal reference strand. Just focus on the SNP allele, chromosome and transcript.

Again, as Brad said, Geneview refers to 3 transcripts, not 3 genes.

I'd recommend spending some time with the dbSNP documentation, particularly Finding Information in a dbSNP Data Report. There is a lot to read and the organisation is not great, but you will find that it answers most common questions and provides most of what you need to interpret the reports.

ADD COMMENT
0
Entering edit mode

i can see i caused confusion in my question, the geneview shows 3 transcripts of one gene. If you look at the flat file output (my mistake - didn't specify) the snp affects 3 different genes whereas it only shows information about one of them in geneview (and it shows 3 transcripts of this gene)

ADD REPLY
0
Entering edit mode

i wasn't clear. sorry. in the flat file report 3 genes are affected by this snp: locus ids = 100133581m 441521, 541625. But on the web page the geneview only shows one of these genes (and yes it shows 3 transcripts [though 2 are the same if i remember rightly])

ADD REPLY
0
Entering edit mode

i will read the docs you recommend. However in the meantime, do you know if there is a field in the flat file output which specifies which strand the snp is on? Is it the orient field in the CTG row?

ADD REPLY
0
Entering edit mode

As Larry says below, strand can be relative to different things; you just have to pick one and stick with it. If you choose relative to chromosome, then it's the "SNP to Chr" column.

ADD REPLY
2
Entering edit mode
14.0 years ago

Agreed that the orientation issue can be confusing. Looking at the view of the SNP relative to transcripts is the easiest way to see that the C/T allele is the variation mapped to the transcript, which happens to be on the negative strand:

The view of the annotators was that it made more sense to map this relative to the change caused in the transcripts.

The change affects one gene, but three different transcripts of that gene; this is due to alternative splicing. You'll notice that the Ensembl representation has 4 transcripts (3 coding) for the same gene, and their splice representation view is a nice way to visualize the transcripts.

Edit in response to your questions: it's not exactly clear what flat file format you are looking at. For the NCBI SNP FlatFile format, the orientation is in the CTG line relative to the assembly you are looking at:

CTG | assembly=GRCh37 | chr=X | chr-pos=134948034 | NT_011786.16 | 
ctg-start=19215744 | ctg-end=19215744 | loctype=2 | orient=-

If you are using the VCF files, which seem to be emerging as a standard, then the orientation is specified by the RV flag:

##INFO=<;ID=RV,Number=0,Type=Flag,Description="RS orientation is reversed">
X       134948034       rs2034920       A       G       .       .
G5;GCF;GNO;HD;PH2;REF;RV;SLO;SYN;VC=SNP;VLD;VP=050100000301050502000101;WGT=1;
dbSNPBuildID=94

For your 3 gene question, I only see 1 gene for this in the LOC attributes for the FlatFile format linked above. Perhaps you could link to the resources you are using that are showing a discrepancy; is it possible they are out of date?

ADD COMMENT
0
Entering edit mode

ok, so you are saying that the dbSNP lists SNPS according to their affects on the transcript. This makes sense. Is there a field somewhere in the record which assigns a snp to positive or negative strand? What happens if a snp affects 2 genes, one of the forward strand and one on the negative strand? which one takes priority?

ADD REPLY
0
Entering edit mode

please see my comment above about the genes/transcripts

ADD REPLY
1
Entering edit mode
14.0 years ago

Great answers above.

It looks to me that the SNP in question actually maps to two transcripts: NM_001007551.3 and NM_001172288.1. There is also a mapping to NM_001007551.2, but the ".2" refers to an older version of RefSeq transcript NM_001007551. All transcripts map to gene CT45A5.

I often find the SNP GeneView pages incomplete and even misleading. The individual data for each SNP is more accurate. When in doubt, take 121 bp of the SNP sequence - from the rs accession - with the polymorphic allele at position 61 and changed to N and BLASTN search it against RefSeq mRNAs to see if it maps in a gene's transcription unit or against the reference genome to see where else it maps. Why 121 bp? This is a good size for good BLAST results (esp. when comparing genomic to mRNA) and with the the SNP @ position 61, you see the polymorphism at position 1 of line 2 of your top HSP. Why change teh SNP to an N - so that it does not match the genome and you can find it in the BLAST report quite easily.

Yes, the forward/reverse strands can be confusing. Pick a convention and use that always - Either with respect to the gene's direction of transcription or with respect to the chromosome. We use the former.

ADD COMMENT
0
Entering edit mode

I have a copy of the flat file report for this snp and it contains 3 GENES. Their ids are locus_id=100133581 | 541465 and 441521. But the website shows just one gene (locus_id=441521 )! i have commented below that geneview shows 3 transcripts although 2 are the same.

ADD REPLY

Login before adding your answer.

Traffic: 2702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6