Number Of Snps In Dbsnp Don'T Add Up
2
2
Entering edit mode
14.1 years ago
Andrea_Bio ★ 2.8k

Hi

Sorry for yet another basic question. I can't get the number of cow SNPs in dnSNP to add up. If i get the total SNPs for bos taurus of type SNP i get 2,210,399 SNPs. If i then get all of the SNPs on chromosomes 1-29 and X and Y I get 1,877,229. If i get the number of unknown SNPs i get 150,853. These last 2 numbers don't add up to the total number of 2,210,399. I tried checking every chromosome in the chromosome option box including unknown and i only got 2028082 SNPs (1,877,229 +150,853), but if i don't make any chromosome selection and just choose cow as the organism I get 2,210,399.

Does anyone know what might be causing this disparity?

Also my eutils query to get only those SNPs on a chromosome doesn't work:

my$query = 'bos taurus[ORG] AND snp[SNP_CLASS] AND (Not Un[CHR]) ';

This returns those SNPs on unknown chromosomes which is the opposite to what I want. It returns the same as this for some reason:

my$query = 'bos taurus[ORG] AND snp[SNP_CLASS] AND  Un[CHR] ';

thanks

• 3.6k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

If I search with just the organism flag set to bos taurus in Entrez SNP web interface I get 2210399. I also get this number for an eutils query: my $query = 'bos taurus[ORG] AND snp[SNP_CLASS]'

ADD REPLY
0
Entering edit mode

If i search with the organism flag set to bos taurus and the class to SNP in Entrez SNP web interface I get 2210399. I also get this number for an eutils query:

my $query = 'bos taurus[ORG] AND snp[SNP_CLASS]'
ADD REPLY
0
Entering edit mode

that is the total number of variations in bos taurus.I'm restricting the snp class to SNP which gives a few less

ADD REPLY
0
Entering edit mode

Biological context and computational representation work in staggeringly different ways and that often produces seemingly counter intuitive results. The same question formulated a different way will often lead to slightly (or even substantially) different results.

ADD REPLY
5
Entering edit mode
14.1 years ago

I think you're missing another type (chNotOn) of mapping using your query for eutils.

from ftp://ftp.ncbi.nih.gov/snp/organisms/cow_9913/README_cow_9913_B130.txt

Data files are identified as follows:

ch1-ch22, chX, chY (ch22 ? hum a copy/paste from the Human Genome I guess ) Names appended to files whose data are located in chromosomes 1-22, and chromosomes X and Y respectively.

chMulti Name appended to a file that contains variations mapped to multiple chromosomes.

chNotOn Name appended to a file that contains variations that did not map to any current chromosome.

chUn Name appended to a file that contains mapped variations on unplaced chromosomes.

ADD COMMENT
0
Entering edit mode

Not meaning to be totally stupid but what is the difference between chUn and chNotOn. I understand that a SNP can map to a DNA sequence that has not yet been mapped to a particular chromosome but I'm not sure what an unplaced chromosome is. Thanks

ADD REPLY
0
Entering edit mode

Also i tried to run queries setting the [CHR] flag to chMulti, chNotOn and chUn and these terms weren't recognised. The term Un is recognised and gives 150,853. I also dropped the 'ch' from the other 2 terms and tried NotOn and Multi and they werent recognised either

ADD REPLY
0
Entering edit mode

an unplaced chromosome is a piece of DNA that haven't been placed on the build. You can see those segments at ftp://ftp.ncbi.nih.gov/snp/organisms/cow_9913/chr_rpts/

ADD REPLY
0
Entering edit mode

I expect the disparity is due to some of those data types you list from the manual

ADD REPLY
2
Entering edit mode
14.1 years ago
Mary 11k

Another thing is the imprecision of the word "cow". If you look through the Index at EntrezSNP you'll see there are more cows than bos taurus. Here's a subset in the bos section--there may be other names I don't know that are indexed elsewhere: alt text

I know some of these look trivial, but they will affect your numbers. And there may be other buckets of them somewhere that I don't know offhand.

PS, edit: the index can also be good for some laughs. My favorite new species that has SNPs is called "crab eating macaque" with 114 snps.

ADD COMMENT
0
Entering edit mode

Hi, where did you find that index. Please could you paste the link? I like that species name!

ADD REPLY
0
Entering edit mode

Oh, sure--you were using Limits before so I thought you might already have seen this.

At this link: http://www.ncbi.nlm.nih.gov/snp Click the Preview/Index tab. In the lower section choose "Organism" from the pulldown. Put your species in the text box. Click Index.

You can also use this to build up queries by highlighting something and clicking the and/or/not buttons.

ADD REPLY
0
Entering edit mode

I didn't look on that tab. That index field is useful. It confirms that the only value for unmapped SNPs is 'un' I still don't know where my missing SNPs are :) I would have thought a query for bos taurus with no limits on the chromosome would give the same as a query for bos taurus plus all the chromosome options in combination but it would seem not so who knows what the internal logic of the query engine is.

ADD REPLY

Login before adding your answer.

Traffic: 2553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6