Question

Extracting All Cds From A Embl File

0

Entering edit mode

10.8 years ago

Pappu ★ 2.1k

I am trying to extract all the DNA sequence corresponding to the CDS using Biopython. However CDS region seems to be in different format in each embl file which makes it difficult to parse e.g.

join{[373615:374161](+), [0:174](+)}

[118940:>119261](-)

[<13907:>13991](+)

join{[426644:426858](+), [0:617](-)}

join{[5947..6076](+), [0..399](+)}

etc.

So I am wondering if there is any tool available for this purpose. Thank you.

biopython • 4.5k views

ADD COMMENT • link updated 10.8 years ago by hpmcwill ★ 1.2k • written 10.8 years ago by Pappu ★ 2.1k

0

Entering edit mode

The INSDC member databases (EMBL-EBI EMBL-Bank, NCBI GenBank and DDBJ) all use the same feature format, which is described in The DDBJ/EMBL/GenBank Feature Table Definition. See section "3.4.3 Location examples" for a set of examples illustrating the various possibilities for the feature location.

ADD REPLY • link 10.8 years ago by hpmcwill ★ 1.2k

score 2 · Answer 1 · 2014-02-13

Biopython will create a SeqFeature for each feature, including the CDS objects, with a complex location object (it has been parsed for you!). It provides an .extract(...) method precisely for this task - getting the sequence described. For examples, see:

"Sequence described by a feature or location" in the tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html
"Working with Sequence Features" here https://github.com/peterjc/biopython_workshop/blob/master/using_seqfeatures/README.rst
"Dealing with GenBank files in Biopython" (almost the same as EMBL files) here http://www.warwick.ac.uk/go/peter_cock/python/genbank/

Or, there is the built in help for the SeqFeature object.

score 1 · Answer 2 · 2014-02-13

1

Entering edit mode

10.8 years ago

hpmcwill ★ 1.2k

While not in BioPython, it may provide a useful alternative... EMBOSS provides the extractfeat program to extract sequence data from a database entry based on a specific feature type (e.g. CDS).

ADD COMMENT • link 10.8 years ago by hpmcwill ★ 1.2k

0

Entering edit mode

Biopython has EMBOSS bindings.

ADD REPLY • link 10.8 years ago by Pappu ★ 2.1k

1

Entering edit mode

True, but not for extractfeat itself - https://github.com/biopython/biopython/blob/master/Bio/Emboss/Applications.py

ADD REPLY • link 10.8 years ago by Peter 6.0k