Hello, I'm new with python and especially with Biopython. I'm trying to take some information from an XML file with Entrez.efetch
and then read it. Last week this script worked well:
handle = Entrez.efetch(db="Protein", id="YP_008872780.1", retmode="xml")
records = Entrez.read(handle)
But now I'm getting an Error:
Bio.Entrez.Parser.ValidationError: Failed to find tag 'GBSeq_xrefs' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False.
So I run this:
records = Entrez.read(handle, validate=False)
But I'm still getting an Error:
TypeError: 'str' object does not support item assignment
After some research I realized that NCBI made new changes concerning the RefSeq
which creates new tags in the xml file (of GenPept): http://www.ncbi.nlm.nih.gov/mailman/pipermail/refseq-announce/2014q2/000117.html
Do I need to change something in the DTD to support these new tags?
Thank you very much for your support.
It is unfortunate the NCBI edited this DTD file - normally they are very good about adding new dated versions instead. In any case, the Biopython copy has already been updated https://github.com/biopython/biopython/commit/9a301b5d1cecad1bb2fee3920f73740448f9aa4f but it was shortly after the Biopython 1.63 release :(
It works. I didn't know where to find a new version of the DTD file.
Thank you very much! :)