Empty in genebank extractor python code
1
0
Entering edit mode
4.0 years ago
MSRS ▴ 590

I found the gbk extractor python code here (https://github.com/LiguanLi/ARG_MRG_Cooccurrence/blob/master/GBK_CDSextract.py). After a few modifications, the code is as below.

#! /usr/bin/python3

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
import sys
import getopt

#file_name1=sys.argv[1]
file_name1=input("Enter the full name of genebank file: ")

gb = SeqIO.parse(file_name1, "genbank")
output_handle1=open(file_name1+".ffn","w")
output_handle2=open(file_name1+".faa","w")
output_handle3=open(file_name1+".info.tab","w")

for record in gb:
    print ("Processing GenBank record %s" % record.id)
    for feature in record.features :
        if(feature.type == "source"):
            taxaid = feature.qualifiers['db_xref']
            if len(taxaid) > 1 :
                output_handle3.write("%s\t%s\t%s\t\n" % record.id, record.description, taxaid[1]))
            else:
                output_handle3.write("%s\t%s\t%s\t\n" % record.id, record.description, taxaid[0]))
    for feature in record.features :
        if(feature.type == "CDS"):
            try:
                ID = feature.qualifiers['db_xref'][0]
                desc = feature.qualifiers['protein_id'][0]
                product = feature.qualifiers['product'][0]
                locus = feature.qualifiers['locus_tag'][0]
                nt_seq = feature.extract(record.seq)
                strand=feature.strand
                start=location.start.position
                end=location.end.position
                try:
                        assert len(feature.qualifiers['translation'])==1
                        aa_seq = feature.qualifiers['translation'][0]
                except AssertionError:
                        print (ID,'no amni acids found!')
                        aa_seq=''  
                output_handle1.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc, product, locus, strand, start, end, nt_seq))
                output_handle2.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc, product, locus, strand, start, end, aa_seq))
            except:
                continue

print ('Retrieving whole genome sequences!')
output_handle1.close()
output_handle2.close()
output_handle3.close()
print ('Done!')

But when I run it, the error message looks like (Solved)

Traceback (most recent call last):
  File "./GBK_CDSextract.py", line 11, in <module>
    gb = SeqIO.parse(file_name1, "genbank")
  File "/home/rahman/.local/lib/python3.8/site-packages/Bio/SeqIO/__init__.py", line 607, in parse
    return iterator_generator(handle)
  File "/home/rahman/.local/lib/python3.8/site-packages/Bio/SeqIO/InsdcIO.py", line 93, in __init__
    super().__init__(source, mode="t", fmt="GenBank")
  File "/home/rahman/.local/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 47, in __init__
    self.stream = open(source, "r" + mode)
FileNotFoundError: [Errno 2] No such file or directory: ''

I am unable to find out where I need to change. I need help. Thank you.

New error (empty output from the genbank file):

 output_handle1.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc, product, locus, strand, start, end, nt_seq))
                    output_handle2.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc, product, locus, strand, start, end, aa_seq))
python gbk • 982 views
ADD COMMENT
2
Entering edit mode
4.0 years ago
Mensur Dlakic ★ 28k

Not sure what is unclear here, as the error message seems pretty obvious:

FileNotFoundError: [Errno 2] No such file or directory: ''

Did you by any chance just hit Enter when being asked for a file name? That would explain the error.

ADD COMMENT
0
Entering edit mode

Thank you very much Dlakic. I have added file name here

file_name1=input("Enter the full name of genebank file: ")

Then getting the error.

ADD REPLY
1
Entering edit mode

Again, the error message is very clear: it can't open a file that is an empty string - that's the meaning of No such file or directory: ''.

When you start the program and get a question Enter the full name of genebank file:, you actually need to type the file name before hitting the Enter key.

ADD REPLY
0
Entering edit mode

Thank you very much Dlakic. It works now.

But two output files are empty!

output_handle1.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc,
    product, locus, strand, start, end, nt_seq))
    output_handle2.write(">%s %s %s %s %s %s %s\n%s\n" % (ID, desc, product, locus, strand, start, end, aa_seq))
ADD REPLY

Login before adding your answer.

Traffic: 1606 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6