Entering edit mode
4.0 years ago
MSRS
▴
590
I found the gbk extractor python code here (https://github.com/LiguanLi/ARG_MRG_Cooccurrence/blob/master/GBK_CDSextract.py). After a few modifications, the code is as below.
#! /usr/bin/python3
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
import sys
import getopt
#file_name1=sys.argv[1]
file_name1=input("Enter the full name of genebank file: ")
gb = SeqIO.parse(file_name1, "genbank")
output_handle1=open(file_name1+".ffn","w")
output_handle2=open(file_name1+".faa","w")
output_handle3=open(file_name1+".info.tab","w")
for record in gb:
print ("Processing GenBank record %s" % record.id)
for feature in record.features :
if(feature.type == "source"):
taxaid = feature.qualifiers['db_xref']
if len(taxaid) > 1 :
output_handle3.write("%s\t%s\t%s\t\n" % record.id, record.description, taxaid[1]))
else:
output_handle3.write("%s\t%s\t%s\t\n" % record.id, record.description, taxaid[0]))
for feature in record.features :
if(feature.type == "CDS"):
try:
ID = feature.qualifiers['db_xref'][0]
desc = feature.qualifiers['protein_id'][0]
product = feature.qualifiers['product'][0]
locus = feature.qualifiers['locus_tag'][0]
nt_seq = feature.extract(record.seq)
strand=feature.strand
start=location.start.position
end=location.end.position
try:
assert len(feature.qualifiers['translation'])==1
aa_seq = feature.qualifiers['translation'][0]
except AssertionError:
print (ID,'no amni acids found!')
aa_seq=''
output_handle1.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc, product, locus, strand, start, end, nt_seq))
output_handle2.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc, product, locus, strand, start, end, aa_seq))
except:
continue
print ('Retrieving whole genome sequences!')
output_handle1.close()
output_handle2.close()
output_handle3.close()
print ('Done!')
But when I run it, the error message looks like (Solved)
Traceback (most recent call last):
File "./GBK_CDSextract.py", line 11, in <module>
gb = SeqIO.parse(file_name1, "genbank")
File "/home/rahman/.local/lib/python3.8/site-packages/Bio/SeqIO/__init__.py", line 607, in parse
return iterator_generator(handle)
File "/home/rahman/.local/lib/python3.8/site-packages/Bio/SeqIO/InsdcIO.py", line 93, in __init__
super().__init__(source, mode="t", fmt="GenBank")
File "/home/rahman/.local/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 47, in __init__
self.stream = open(source, "r" + mode)
FileNotFoundError: [Errno 2] No such file or directory: ''
I am unable to find out where I need to change. I need help. Thank you.
New error (empty output from the genbank file):
output_handle1.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc, product, locus, strand, start, end, nt_seq))
output_handle2.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc, product, locus, strand, start, end, aa_seq))
Thank you very much Dlakic. I have added file name here
Then getting the error.
Again, the error message is very clear: it can't open a file that is an empty string - that's the meaning of
No such file or directory: ''
.When you start the program and get a question
Enter the full name of genebank file:
, you actually need to type the file name before hitting theEnter
key.Thank you very much Dlakic. It works now.
But two output files are empty!