Entering edit mode
8.6 years ago
cl10101
▴
80
I'm trying to download CDS sequences for a given genome using Biopython. My script looks like this:
from Bio import Entrez
from Bio import SeqIO
Entrez.email = "c...@gmail.com"
genomeAccessions = ['NC_021353.1', 'NC_020913.1']
handle = Entrez.efetch(db="nucleotide", id=genomeAccessions, rettype="gb")
records = SeqIO.parse(handle, "gb")
for i,record in enumerate(records):
print(len(record.features))
for feature in record.features:
if feature.type == "CDS":
print feature.location
print feature.qualifiers["protein_id"]
print feature.location.extract(record).seq
But using this code I get only one feature (for example for genome NC_021353) even though there are many features http://www.ncbi.nlm.nih.gov/nuccore/NC_021353.
I would be grateful for any suggestion what I'm doing wrong.
I think something similar has been already discussed here:
But this solution is not in Biopython.
Download cds region coordinates
This one is.
How To Extract Just 'Cds' From Genbank File Into Another Genbank File?
There are some other suggestions here:
BioPython error parsing standard GenBank file