Hi there,
I attempted to parse an old GenBank file (see URL below) to extract various features (which I would then want to write in GFF format). I encountered an error (please see below) when parsing the GenBank file with Biopython (version 1.64) using SeqIO.parse method to access the records.
GenBank file: ftp://ftp.ensembl.org/pub/release-22/human-22.34d/data/flatfiles/genbank/Homo_sapiens.3000.dat.gz
Biopython error: /opt/apps/python/2.7.3/lib/python2.7/site-packages/Bio/GenBank/__init__.py:1108: BiopythonParserWarning: Couldn't parse feature location: 'AL358792.24.1.166931:3274..3461'
% (location_line)))
I looked at the Bio/GenBank/__init__.py
file and found many regular expressions that check the format of the feature locations and these regexps seem to include the format of the location I encounter i.e. 'AL358792.24.1.166931:3274..3461' (please see the example regexp below for complex location from the __init__.py file). So I am not quite sure why the code raises the BiopythonParserWarning error.
Regexp in Bio/GenBank/__init__.py: _complex_location = r"([a-zA-z][a-zA-Z0-9_]*(\.[a-zA-Z0-9]+)?\:)?(%s|%s|%s|%s|%s)" % (_pair_location, _solo_location, _between_location, _within_location, _oneof_location)
Could anybody please help me solve this parsing issue?
Thank you very much for your help.
Thank you Peter for investigating this and reporting the issue on the Biopython bug listing. Very much appreciated your help with this.
You described a warning which I can reproduce (well, lots and lots of warnings about problematic locations, probably due to the period/dots in the sequence reference name), but what is the error? I don't see any exception and traceback in your question - or do you mean how can we fix the warning?
You're right, as such there is no error triggered but a warning is raised to do with the impossibility to parse the problematic feature locations. So yes, I meant how can we fix the BiopythonParserWarning issue - so that we can retrieve the locations for those "problematic" features. Thank you very much for your help.