I am trying to fetch genbank files from a list of given accession ids, which are stored in a file, by using biopython. This is how I do it so far:
#!/usr/bin/env python
from sys import argv, stdout, exit
from Bio import SeqIO
from Bio import Entrez
Entrez.email='example@mail.com'
def searchInDb(searchFor):
handle = Entrez.efetch(db='nucleotide', id=searchFor, rettype='gb')
link = searchFor + ".gb"
local_file = open(link, 'w')
local_file.write(handle.read())
handle.close()
local_file.close()
if __name__ == '__main__':
if len(argv) != 2:
print '\tmissing file link'
exit(1)
name = argv[1]
with open(name, "r") as ins:
for line in ins:
ID = line.rstrip('\n')
print "Getting gb file for ", ID
searchInDb(ID)
However when I do it like this and later take a look at the .gb file, it is not complete, I dont have any information about the CDS or anything, but I need exactly those because later I want to parse from the gb file the gene_locus_tags as well as the position of the CDS on the genome and so on.
Does someone know how do I need to change my code so I achieve getting the complete .gb file??
Yes you are right. But when I manually download the gb files for my accessions, I have the complete file, so that is why I guessed my code is wrong. Taking for example this one: NC_021485, with my code the .gb file is not complete
use rettype=gbwithparts
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_021485&retmode=xml&rettype=gbwithparts
however, I'ts genbank/text don't know how to retrieve the XML output.
Yes, I tried it, and it works so far. thanks.