Hello,
I am trying extract certain information from a gbk file I can extract the locus tag and the amino acid sequence however I am struggling to extract the gene location as it not in the same format in the file e.g.: /locus_tag="NCTC86_00002"
This is my script so far:
from Bio import GenBank
from Bio import SeqIO
gbk_filename = "HS.gb"
faa_filename = "HS_converted.faa"
input_handle = open(gbk_filename, "r")
output_handle = open(faa_filename, "w")
for seq_record in SeqIO.parse(input_handle, "genbank"):
print "Dealing with GenBank record %s" % seq_record.id
for seq_feature in seq_record.features :
if seq_feature.type=="CDS" :
assert len(seq_feature.qualifiers['translation'])==1
output_handle.write(">%s from %s\n%s\n" % (
seq_feature.qualifiers['locus_tag'][0],
seq_record.name,
seq_feature.qualifiers['translation'][0]))
output_handle.close()
input_handle.close()
print "Done"
So also if I wanted to print the gene annotation, not every CDS entry contains a
/gene=""
Do I need to put in a if there is no
/gene=""
clause?