kegg biopython error in retrieving enzyme record
1
0
Entering edit mode
7.5 years ago
JRCX ▴ 10

Hi!

I am having a problem retrieving enzyme data with biopython.

I import the necessary packages:

from Bio.KEGG import REST
from Bio.KEGG import Enzyme

For some cases the parser works eg:

request = REST.kegg_get("ec:2.3.1.237")
open("ec_2.3.1.237.txt",'w').write(request.read())
records = Enzyme.parse(open("ec_2.3.1.237.txt"))
record = list(records)[0]
print(record.genes)

[('SEN', ['SACE_5532']), ('SAQ', ['Sare_4951', 'ACTN:', 'L083_3191'])]

But for others, unfortunately, it doesn't. Here is an example:

request = REST.kegg_get("ec:2.3.1.246")
open("ec_2.3.1.246.txt",'w').write(request.read())
records = Enzyme.parse(open("ec_2.3.1.246.txt"))
record = list(records)[0]
print(record.genes)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-37-0e0367f55487> in <module>()
      2 open("ec_2.3.1.246.txt",'w').write(request.read())
      3 records = Enzyme.parse(open("ec_2.3.1.246.txt"))
----> 4 record = list(records)[0]
      5 print(record.genes)

/anaconda/lib/python3.5/site-packages/Bio/KEGG/Enzyme/__init__.py in parse(handle)
    267                 record.genes.append(row)
    268             else:
--> 269                 row = record.genes[-1]
    270                 key, values = row
    271                 for value in data.split():

IndexError: list index out of range

I have been trying to look at both cases but I cannot spot a difference.

Thank you in advance.

KEGG Biopython enzyme gene • 1.9k views
ADD COMMENT
3
Entering edit mode
7.5 years ago
Felix_Sim ▴ 260

This issue has been reported to the Biopython community on GitHub and the status can be monitored under issue #1275.

Update:

This issue has in fact been resolved in Biopython v1.69, so run pip install -U biopython and you should not encounter it anymore.


This appears to be a bug in BioPython.

Loading the data without storing it in a variable illustrates this.

>>> list(Enzyme.parse(open("ec_2.3.1.237.txt")))
[<Bio.KEGG.Enzyme.Record at 0x7f182a1e2ed0>]

>>> list(Enzyme.parse(open("ec_2.3.1.246.txt")))
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-32-3263d682ee05> in <module>()
----> 1 list(Enzyme.parse(open("ec_2.3.1.246.txt")))

/home/felix/anaconda/lib/python2.7/site-packages/Bio/KEGG/Enzyme/__init__.py in parse(handle)
    267                 record.genes.append(row)
    268             else:
--> 269                 row = record.genes[-1]
    270                 key, values = row
    271                 for value in data.split():

IndexError: list index out of range

The problem is caused in line 263 in file Bio/KEGG/Enzyme/__init__.py, which assumes that all GENES keys are three characters long:

262         elif keyword == "GENES       ":
263             if data[3:5] == ': ':
264                 key, values = data.split(":", 1)

If you change the above to the following, you should get your desired result:

262         elif keyword == "GENES       ":
263             if data[3:5] == ': ' or data[4:6] == ': ':
264                 key, values = data.split(":", 1)

This is probably not the optimal solution but fixes the problem for now.

The test case without issues:

>>> records = Enzyme.parse(open("ec_2.3.1.237.txt"))
>>> record = list(records)[0]
>>> print(record.genes)
[('SEN', ['SACE_5532']), ('SAQ', ['Sare_4951']), ('ACTN', ['L083_3191'])]

The test case with issues:

>>> records = Enzyme.parse(open("ec_2.3.1.246.txt"))
>>> record = list(records)[0]
>>> print(record.genes)
[('SMAF', ['D781_2331']), ('XAL', ['XALC_1059']), ('XTN', ['FD63_09790']), ('MNR', ['ACZ75_13535']), ('NBR', ['O3I_032820']), ('SVE', ['SVEN_0493']), ('SRC', ['M271_00240', 'M271_09145']), ('SCW', ['TU94_04905', 'TU94_32245']), ('STRC', ['AA958_31865']), ('SLE', ['sle_03950', 'sle_58620']), ('KFL', ['Kfla_4132']), ('NDA', ['Ndas_1740']), ('FRA', ['Francci3_2458']), ('AOI', ['AORI_1502', 'AORI_5330']), ('AJA', ['AJAP_32005']), ('ALU', ['BB31_20660']), ('MPRO', ['BJP34_31700'])]
ADD COMMENT

Login before adding your answer.

Traffic: 2316 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6