How To Extract Just 'Cds' From Genbank File Into Another Genbank File?
1
0
Entering edit mode
11.1 years ago
microbeatic ▴ 80

I have a genbank file with multiple genomes (concatenated using cat). I want to produce a subset of this file with only 'CDS' sequences. It seems very trivial, but i am having trouble with this. Here is the code which succesfully prints each CDS record, but fails to produce a genbank file with those CDS.

from Bio import SeqIO,SeqFeature
import sys

gbank=SeqIO.parse(open(sys.argv[1],"rU"),"genbank"

for genome in gbank:
    print "looking in %s" %genome.id)
    for gene in genome.features:
        if gene.type == 'CDS':
            CDS=gene
            print CDS

output_handle=open("all_CDS.gbk","w")
SeqIO.write(CDS,output_handle,"genbank")    
output_handle.close()

The code prints CDS, but it produces following error at the end.

AttributeError: 'int' object has no attribute 'name'
genbank biopython • 12k views
ADD COMMENT
3
Entering edit mode
11.1 years ago
Peter 6.0k

The error you are getting is because you are not giving SeqIO.write a SeqRecord object, but a single SeqFeature object (the final CDS looked at in the loop).

You need to edit the SeqRecord object to remove the unwanted SeqFeature objects, and then save that, e.g.

from Bio import SeqIO
record = SeqIO.parse("original.gbk","genbank")
record.features = [f for f in record.features if f.type == "CDS"]
SeqIO.write(record, "only_cds.gbk", "genbank")

That should give you a simpler GenBank file where the feature table only contains the CDS features. I'm not sure why you'd do that, but that is what I think you are asking for.

ADD COMMENT
0
Entering edit mode

Thanks @Peter. It worked. I needed to "slimdown" the genbank file which i was using to extract location information and add to a sequence header. I am pretty novice to biopython and python (a month), so i realize its highly inefficient.

ADD REPLY

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6