I am trying to extract all the DNA sequence corresponding to the CDS using Biopython. However CDS region seems to be in different format in each embl file which makes it difficult to parse e.g.
join{[373615:374161](+), [0:174](+)}
[118940:>119261](-)
[<13907:>13991](+)
join{[426644:426858](+), [0:617](-)}
join{[5947..6076](+), [0..399](+)}
etc.
So I am wondering if there is any tool available for this purpose. Thank you.
The INSDC member databases (EMBL-EBI EMBL-Bank, NCBI GenBank and DDBJ) all use the same feature format, which is described in The DDBJ/EMBL/GenBank Feature Table Definition. See section "3.4.3 Location examples" for a set of examples illustrating the various possibilities for the feature location.