Question

Multi-records genbank to CDS

0

Entering edit mode

4.9 years ago

hazirliver ▴ 10

Hi! I have a file containing several genebank records written one after the other. I need to extract CDS (protein sequnce(/translation), /locus_tag, /inference, /product and contig id) from all contigs. How can i do it?
The input format looks like this enter image description here
And the result looks like this

How can i do this?

CDS genbank biopython • 1.6k views

ADD COMMENT • link 4.9 years ago by hazirliver ▴ 10

0

Entering edit mode

Since you are analyzing data, it would be helpful if you make some effort to write a small script to read a file line by line and process it.

ADD REPLY • link 4.9 years ago by husensofteng ▴ 410

score 0 · Answer 1 · 2020-08-26

0

Entering edit mode

4.9 years ago

Joe 22k

To clarify, you want all proteins/products, from all the entries in the file?

If so, take a look here: https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/genbank2fasta/

ADD COMMENT • link 4.9 years ago by Joe 22k

0

Entering edit mode

Yes, thanks! The code in this article was giving me an error, but this article got me on the right way to find the answer. I found the right solution using SeqIO.InsdcIO.GenBankCdsFeatureIterator.

ADD REPLY • link 4.9 years ago by hazirliver ▴ 10