duplicating features in genbank file with biopython
1
0
Entering edit mode
10.0 years ago
s.vandenhurk ▴ 10

I have got a lot of genbank files with multiple genes in them, some of these genes have a single start and stop position e.g. 1000..1390 and some have multiple start and stop positions e.g. join(1000..1390,1400..1790,1900..2275)

I want to duplicate the entire CDS for the genes with multiple start and stop positions and insert only 1 start and stop position for every duplicate.

So 1 CDS with 3 starts/stops should become 3CDS with 1start/stop each.

Anyone got a clue on how to achieve this?

biopython genbank • 2.6k views
ADD COMMENT
0
Entering edit mode

are you sure those "multiple start stop CDS" are not in fact indicating the intron/exon boundaries?

ADD REPLY
0
Entering edit mode
10.0 years ago
Peter 6.0k

Unfortunately the answer is you shouldn't be doing this.

As per @Whetting's comment this is a meaningless question. Coding sequence (CDS) features like join(1000..1390,1400..1790,1900..2275) are generally indicating splicing (intron/exon boundaries) or in some cases ribosomal slippage.

Each of these regions in itself is not a CDS. It may not be a multiple of three in length, and may not be in-frame. You shouldn't therefore replace this complex CDS feature with the "CDS" features.

What you might meaningfully do is replace every CDS record with one or more exon records (each with simple coordinates, i.e. one start/stop), but then it wouldn't be a normal GenBank file any more.

ADD COMMENT
0
Entering edit mode

is there a way to do this with biopython? and if so, where can I find a guide on how to? I don't mind the fact it wouldn't be a normal GenBank file because I'm the end user of these files in my occasion.

ADD REPLY
0
Entering edit mode

The CompoundLocation has a list of child locations which are simple FeatureLocation objects which you should re-use as the location of a new SeqFeature for each part. See the built in help (docstrings) for these objects. e.g. http://biopython.org/DIST/docs/api/Bio.SeqFeature.CompoundLocation-class.html

ADD REPLY

Login before adding your answer.

Traffic: 2040 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6