I have a gbk of a draft bacterial genomes (meaning multiple contigs). I want to extract some info about specific genes, and I get an error I cannot understand.
Parse the gbk
record_dict = SeqIO.to_dict(SeqIO.parse("SpeciesA.gbk", "genbank"))
My file looks:
for i in record_dict:
... for f in record_dict[i].features:
... print f.qualifiers
{'locus_tag': ['AA_03640'], 'gene': ['ftsH_4']}
{'locus_tag': ['AA_03640'], 'inference': ['ab initio prediction:Prodigal:2.6', 'similar to AA sequence:UniProtKB:P37476'], 'codon_start': ['1'], 'EC_number': ['3.4.24.-'], 'transl_table': ['11'], 'product': ['ATP-dependent zinc metalloprotease FtsH'], ........
Now if I try to get out the contig name where the gene AA_03640 is in, as follow:
for i in record_dict:
... for f in record_dict[i].features:
... if f.qualifiers['locus_tag'] == 'AA_03640':
print(i)
But I get the following error:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
KeyError: 'locus_tag'
Any help to understand where I am wrong? Thanks
The error is quite self-explanatory I would say, and means that
f.qualifiers
doesn't have a key "locus_tag"...You can check which keys it does have using
f.qualifiers.keys()
I thought the same, but when I check the keys locus_tag is in there:
Odd. For a moment I thought this was an easy question ;-)
Maybe use a try-except block:
An unrelated problem is that you should check for
f.qualifiers['locus_tag'] == ['AA_03640']
, since it's a list and not a character.ok, I am officially confused. If I run:
I get
But If I run
I get
I should have the same keys, shouldn't I?
To answer your question, no, you should not assume all the features have the same keys. It varies dramatically by the feature type - the source feature (usually there is only one) has things like the organism, while while CDS and gene features have things like a locus tag, and the CDS often has a translation. You probably want to add
if f.type == "CDS":
orif f.type == "gene":
to your loop. See also http://www.warwick.ac.uk/go/peter_cock/python/genbank which does something similar to what I think you want to achieve.(I have expanded on this as a full answer)
I join being confused.
I don't really have experience with genbank files so I can't really nail this down. Maybe something for Peter, although I'm not sure if he visits biostars often...
Double check your input genbank isn't malformed and that its
/locus_tag
fields are correct?If you hit Peter up on twitter he's been really helpful to me in the past (and hes always on twitter ;) )
I checked the gbk, it looks fine. I will try twitter..:)
Someone kindling pinged me on Twitter
They were faster then me!=)