Issues In Parsing The Xml For Dbsnp Via Biopython
1
2
Entering edit mode
11.8 years ago
heath ▴ 20

am trying to parsing the xml(?) from the Entrez's dbSNP database

 from Bio import Entrez
 Entrez.email="xxxxl@gmail.com"
 handle=Entrez.efetch(db="snp", id="121434622",)
 cont=handle.read()

I see there some poster related to how to parse the file from Entrez:

http://stackoverflow.com/questions/11322250/biopython-class-instance-output-from-entrez-read-i-dont-know-how-to-manipula But strange enough(?) the cont.type i got is a str not a class? I saw a post in 2009 said it may be a bug at NCBI for the dbSNP, but i am not sure it is still true after 4 yrs. Any efficient way i should use to parse the information from dbSNP?

Thanks a lot!

biopython xml entrez parser • 4.0k views
ADD COMMENT
3
Entering edit mode
11.8 years ago
David W 4.9k

Couple of things here,

1) The handle you create with Entrez.efetch acts just like a file handle, so reading it into cont gave you a string, not a parsed record. If you print that string you'll see it's not XML but dbSNPs native format (with many curly braces). To get XML records you need to set rettype to xml:

handle=Entrez.efetch(db="snp", id="121434622", rettype="xml")

Ordinarily, you'd parse the contents of that handle with Entrez.read()

record = Entrez.read(handle)

2) As it happens, the XML records for snSNP are a bit different than other NCBI records, and Biopython doesn't handle them. This question has some work arounds for Find Amino Acid Change For Snp Using Eutils

ADD COMMENT
0
Entering edit mode

Thanks a lot! It is extremely helpful!

ADD REPLY

Login before adding your answer.

Traffic: 1898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6