Trying to parse the SwissProt keywlist.txt with biopython
1
0
Entering edit mode
5.6 years ago
schlogl ▴ 160

Hey guys I found a bug in one of the codes in the cookbook and tutorials from Biopython. Tried this one and got a KeyError: 'ID'.

from Bio.SwissProt import KeyWList   
handle = open("keywlist.txt")  
records = KeyWList.parse(handle)  
for record in records:  
    print(record[’ID’])  
    print(record[’DE’])

Do you have any ideia or someone else find this problem?

Thanks

Paulo

The version of the tutorial/cookbook : Last Update { 17 December 2014 (Biopython 1.65)

biopython • 1.6k views
ADD COMMENT
0
Entering edit mode

still getting same error! 8(

ADD REPLY
3
Entering edit mode
5.6 years ago
Joe 21k

I have added formatting to your post for clarity. Please use the 101010 button to make code more readable.

The code on the cookbook works without issue for me. It looks like you are using a weird ' character: . That looks to be an apostrophe rather than a single quotation mark. Ensure you're using the right character, and try again would be my suggestion.

Given their example input file (stored in the current working directory as keywlist.txt):

ID   2Fe-2S.
AC   KW-0001
DE   Protein which contains at least one 2Fe-2S iron-sulfur cluster: 2 iron
DE   atoms complexed to 2 inorganic sulfides and 4 sulfur atoms of
DE   cysteines from the protein.
SY   Fe2S2; [2Fe-2S] cluster; [Fe2S2] cluster; Fe2/S2 (inorganic) cluster;
SY   Di-mu-sulfido-diiron; 2 iron, 2 sulfur cluster binding.
GO   GO:0051537; 2 iron, 2 sulfur cluster binding
HI   Ligand: Iron; Iron-sulfur; 2Fe-2S.
HI   Ligand: Metal-binding; 2Fe-2S.
CA   Ligand.
//
ID   3D-structure.
AC   KW-0002
DE   Protein, or part of a protein, whose three-dimensional structure has
DE   been resolved experimentally (for example by X-ray crystallography or
DE   NMR spectroscopy) and whose coordinates are available in the PDB
DE   database. Can also be used for theoretical models.
HI   Technical term: 3D-structure.
CA   Technical term.
//

This code snippet returns the expected result.

from Bio.SwissProt import KeyWList
handle = open("keywlist.txt")
records = KeyWList.parse(handle)
for record in records:
    print(record["ID"])

2Fe-2S.
3D-structure.
ADD COMMENT
0
Entering edit mode

Thank you for your attention. I will try that. For me it's only works with the file2 ("keywlist2.txt" ).

ADD REPLY
0
Entering edit mode

Yes, you need to have a file in the directory which corresponds to the name given in the open() call. It’s not magically creating a file, you actually need one to parse, and you need to change what’s inside open() to match accordingly.

Copy and paste my code exactly, and see if that works.

ADD REPLY
0
Entering edit mode

@ jrj.healey

sir I have all the files in the same places as my notebooks and I told you that the code works only with one (keywlist2.txt) of the files that I downloaded from Biopython github.

It also doesn't work when I downloaded from the SwissProt site.

I also found that in the MEME files there are some troubles for works with some codes.

But maybe because environment incompatibilities or some else. Or I am dumb :)

But I appreciate you help.

Thank you

ADD REPLY
1
Entering edit mode

I don't know that this would be a 'bug' per se. The KeyError is to be expected since the 2nd entry in keywlist.txt in that example, does not have an "ID" field, hence the error.

See here for instance:

from Bio.SwissProt import KeyWList

# File 1

print("Working on file 1...")
print("====================")
handle = open("./keywlist.txt")
records = KeyWList.parse(handle)
for i, record in enumerate(records, start = 1):
    try:
        print("Record {} -> ID = {}".format(i, record["ID"]))
    except KeyError:
        print("ERROR: ID issue with File 1, Record {} -> {}...".format(i, str(record)[0:20]))
    try:
        print("Record {} -> DE = {}".format(i, record["DE"]))
    except KeyError:
        print("ERROR: DE issue with File 1, Record {} -> {}".format(i, str(record)))

handle.close()

# File 2

print("\nWorking on file 2...")
print("====================")
handle2 = open("./keywlist2.txt")
records2 = KeyWList.parse(handle2)
for i, record in enumerate(records2, start = 1):
    try:
        print("Record {} -> ID = {}".format(i, record["ID"]))
    except KeyError:
        print("ERROR: ID issue with File 2, Record {} -> {}...".format(i, str(record)[0:20]))
    try:
        print("Record {} -> DE = {}".format(i, record["DE"]))
    except KeyError:
        print("ERROR: DE issue with File 2, Record {} -> {}".format(i, str(record)[0:20]))

handle.close()

Yields:

Working on file 1...
====================
Record 1 -> ID = 2Fe-2S.
Record 1 -> DE = Protein which contains at least one 2Fe-2S iron-sulfur cluster: 2 iron atoms complexed to 2 inorganic sulfides and 4 sulfur atoms of cysteines from the protein.
ERROR: ID issue with File 1, Record 2 -> {'SY': '', 'DE': 'Ke...
Record 2 -> DE = Keywords assigned to proteins due to their particular molecular function.
Record 3 -> ID = Zymogen.
Record 3 -> DE = The enzymatically inactive precursor of mostly proteolytic enzymes.

Working on file 2...
====================
Record 1 -> ID = 2Fe-2S.
Record 1 -> DE = Protein which contains at least one 2Fe-2S iron-sulfur cluster: 2 iron atoms complexed to 2 inorganic sulfides and 4 sulfur atoms of cysteines from the protein.
Record 2 -> ID = 3D-structure.
Record 2 -> DE = Protein, or part of a protein, whose three-dimensional structure has been resolved experimentally (for example by X-ray crystallography or NMR spectroscopy) and whose coordinates are available in the PDB database. Can also be used for theoretical models.
Record 3 -> ID = 3Fe-4S.
Record 3 -> DE = Protein which contains at least one 3Fe-4S iron-sulfur cluster: 3 iron atoms complexed to 4 inorganic sulfides and 3 sulfur atoms of cysteines from the protein. In a number of iron-sulfur proteins, the 4Fe-4S cluster can be reversibly converted by oxidation and loss of one iron ion to a 3Fe-4S cluster.

So, as expected, since there is no ID field for entry 2 (it's a Identifier (category) entry, not an Identifier (keyword) entry according to the file), there is an 'error' with that entry in the loop (see the line that begins ERROR:....

I'm not very familiar with this part of BioPython though, so one of the devs would have to weigh in as to whether this is expected behaviour for the parser or not (and whether the documentation should be modified accordingly)

ADD REPLY
1
Entering edit mode

Got it. Because with an downloaded file from the site it worked for many records until I got that key error. Thank you for your time and patience. I am expecting that every file are perfect, but they are not. :)

ADD REPLY
0
Entering edit mode

If this has solved your issue, please go ahead and accept the answer via the check mark next to the post to provide closure to the thread.

ADD REPLY

Login before adding your answer.

Traffic: 2308 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6