I want to parse XML response obtained from bioproject DB using efetch module in Biopython.
Here is my code:
from Bio import Entrez
Entrez.email = "myemail@company.org"
handle = Entrez.efetch(db="bioproject", id="55465", rettype='gb',retmode="xml")
records = Entrez.parse(handle)
for record in records:
print record
but this gives the following error:
Bio.Entrez.Parser.ValidationError: Failed to find tag 'RecordSet' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False.
instead if I try this, it works but it gives the XML lines as it is (no parsing)
handle = Entrez.efetch(db="bioproject", id="55465", rettype='gb',retmode="xml")
readlines = handle.readlines()
for line in readlines:
print line
Can anyone please guide as what is the right way to parse the XML response in this case?
Why don't you try what
Bio.Entrez.Parser.
said? "please call Bio.Entrez.read or Bio.Entrez.parse with validate=False."Thanks for replying, I did try putting in validate=False in parse but nothing gets printed. I was reading on regarding efetch:
So looks like when validate is set to False, those tags are getting skipped and no output shows up. Maybe the XML response is not well-formed in this case.