Hi all,
I'm scanning through all of GenBank's bacterial genomes using biopython.
I've been getting an occasional error recently parsing location data. Specifically:
File "/usr/lib/pymodules/python2.7/Bio/SeqIO/init.py", line 525, in parse
for r in i:
File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 437, in parserecords
record = self.parse(handle, dofeatures)
File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 420, in parse
if self.feed(handle, consumer, dofeatures):
File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 392, in feed
self.feedfeaturetable(consumer, self.parsefeatures(skip=False))
File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 344, in _feedfeaturetable
consumer.location(locationstring)
File "/usr/lib/pymodules/python2.7/Bio/GenBank/init.py", line 975, in location
raise LocationParserError(location_line)
Bio.GenBank.LocationParserError: order(join(649703..649712,649751..649752),650047..650049)
My code is a simple loop through all filenames I feed in at the command line:
[...]
try:
contig = SeqIO.parse(open(gb_file,"r"), "genbank")
except:
sys.stderr.write("ERROR: Parsing gbk file "+gb_file+"!\n")
sys.exit(1)
sys.stderr.write("Loading genome " + str(counter) + " of "+str(len(sys.argv)-1)+" ("+gb_file+")\n")
for gb_record in contig:
[...]
This is in the Aeropyrum pernix K1 genome, NC_000854.gbk. I don't see anything wrong with the location data. Can anyone help?
Thanks, -Morgan
I noted this on the Biopython bug report, and emailed the NCBI.