Entering edit mode
3.9 years ago
kamel
▴
70
Dear colleagues, I have a genbank file annotated from a genome draft (contig_1 to contig_89). I'm interested in extracting only a region 20,500 to 21,500 from contig_2 and I would like to keep the gbk format as output. Can someone help me with this.
Thanks in advance !
Check this thread Slicing Genbank File by 'gene_id' range
Related to the above, the code that was derived from is here: https://github.com/jrjhealey/bioinfo-tools/blob/master/Genbank_slicer.py
The linked thread was an adaptation to slice between features, but if you want to use indices, its actually even easier.
Thanks @Joe for your feedback. I used this script on the genbank of the reference strain but got the error below. Please also find the command I used.
I solved this problem by installing biopython: 1.77
Do you have any idea how to specify the contig number on which I can extract the part that interests us. For example I want to target contig 2: from 20,500 to 21,500 ??
Is your input file a multi-genbank in that case?
If you know which of these records you want (2) that makes it a little easier, but still needs quite a lot of code refactoring. Is it a safe assumption that your contigs are listed in order within the file?
I don't have time right now to write the full code, so there is some pseudo-code below. If I get chance I'll try and work this up properly.