Query Returns A Different Number Of Results When Fetching With Biopython From Dbsnp
1
0
Entering edit mode
11.8 years ago
heath ▴ 20

Hey all, I have issues on the output of rsid from dbSNP, if i use the website base output for specific terms of the protein I got an list of rsid no# around 13000, http://www.ncbi.nlm.nih.gov/snp/details?querykey=3

(pathogenic[Clinical_Significance] OR probable pathogenic[Clinical_Significance]) AND (nonsense[Function_Class] OR missense[Function_Class] OR frameshift[Function_Class]) AND "Homo sapiens"[Organism]

but if i use the Biopython Entrez ...i get different number for the output rsid list

fh= Entrez.esearch(db='snp', retmax= '15000', term="pathogenic OR probable pathogenic AND nonsens OR missense OR frameshift AND Homo sapiens")
rec=Entrez.read(fh)
rsid_list=rec['IdList']

the len(rsid_list) is 15000?? did I make sth wrong?

Thanks!

biopython entrez dbsnp • 3.2k views
ADD COMMENT
0
Entering edit mode

you do have a typo in what you show at nonsens instead of nonsense

ADD REPLY
2
Entering edit mode
11.8 years ago
Peter 6.0k

You are not using the same search term - your web version included the [field] restrictions and different AND/OR combinations, while the version you used in Biopython did not. Try:

from Bio import Entrez
fh= Entrez.esearch(db='snp', retmax= '15000', term='(pathogenic[Clinical_Significance] OR probable pathogenic[Clinical_Significance]) AND (nonsense[Function_Class] OR missense[Function_Class] OR frameshift[Function_Class]) AND "Homo sapiens"[Organism]')
rec=Entrez.read(fh)
fh.close()
rsid_list=rec['IdList']
print len(rsid_list)

Right now that gives 13035 results.

ADD COMMENT
0
Entering edit mode

Thanks a lot :-) .!

ADD REPLY

Login before adding your answer.

Traffic: 1830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6