Question

fetch -complete- genbank file using biopython

1

Entering edit mode

7.0 years ago

beginner_problem ▴ 10

I am trying to fetch genbank files from a list of given accession ids, which are stored in a file, by using biopython. This is how I do it so far:

#!/usr/bin/env python

from sys import argv, stdout, exit
from Bio import SeqIO
from Bio import Entrez

Entrez.email='example@mail.com'

def searchInDb(searchFor):

handle = Entrez.efetch(db='nucleotide', id=searchFor, rettype='gb')

link = searchFor + ".gb"
local_file = open(link, 'w')
local_file.write(handle.read())
handle.close()
local_file.close()

if __name__ == '__main__':
if len(argv) != 2:
    print '\tmissing file link'
    exit(1)
name = argv[1]

with open(name, "r") as ins:
    for line in ins:
        ID = line.rstrip('\n')
        print "Getting gb file for ", ID
        searchInDb(ID)

However when I do it like this and later take a look at the .gb file, it is not complete, I dont have any information about the CDS or anything, but I need exactly those because later I want to parse from the gb file the gene_locus_tags as well as the position of the CDS on the genome and so on.

Does someone know how do I need to change my code so I achieve getting the complete .gb file??

genome • 4.9k views

ADD COMMENT • link 7.0 years ago by beginner_problem ▴ 10

score 1 · Answer 1 · 2017-11-08

1

Entering edit mode

7.0 years ago

Pierre Lindenbaum 164k

it is not complete, I dont have any information about the CDS or anything,

Give us some examples of accession numbers. Furthermore, not all sequences have those informations.

ADD COMMENT • link 7.0 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Yes you are right. But when I manually download the gb files for my accessions, I have the complete file, so that is why I guessed my code is wrong. Taking for example this one: NC_021485, with my code the .gb file is not complete

ADD REPLY • link 7.0 years ago by beginner_problem ▴ 10

1

Entering edit mode

use rettype=gbwithparts

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_021485&retmode=xml&rettype=gbwithparts

however, I'ts genbank/text don't know how to retrieve the XML output.

ADD REPLY • link 7.0 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Yes, I tried it, and it works so far. thanks.

ADD REPLY • link 7.0 years ago by beginner_problem ▴ 10