I have a genbank file with multiple genomes (concatenated using cat). I want to produce a subset of this file with only 'CDS' sequences. It seems very trivial, but i am having trouble with this. Here is the code which succesfully prints each CDS record, but fails to produce a genbank file with those CDS.
from Bio import SeqIO,SeqFeature
import sys
gbank=SeqIO.parse(open(sys.argv[1],"rU"),"genbank"
for genome in gbank:
print "looking in %s" %genome.id)
for gene in genome.features:
if gene.type == 'CDS':
CDS=gene
print CDS
output_handle=open("all_CDS.gbk","w")
SeqIO.write(CDS,output_handle,"genbank")
output_handle.close()
The code prints CDS, but it produces following error at the end.
AttributeError: 'int' object has no attribute 'name'
Thanks @Peter. It worked. I needed to "slimdown" the genbank file which i was using to extract location information and add to a sequence header. I am pretty novice to biopython and python (a month), so i realize its highly inefficient.