Question

Update Biopython'S Seqrecords?

0

Entering edit mode

12.2 years ago

biocyberman ▴ 870

I looked at the documentation here: http://biopython.org/wiki/SeqRecord , but I did not find any information or methods to update properties of SeqRecords. I want to do following things:

Load EMBL records from a file.
Update source information for the records like organism, project, mol_type
Update accession numbers
Update IDs
Update descriptions
Add references
Remove db_xref qualifiers in source and other features.

I can decompose each record and reconstruct it with update information by using init method like in here: http://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html However, I believe there must be a better way to do this. Would it be necessary to extend SeqRecord class for my purpose?

biopython python • 4.3k views

ADD COMMENT • link updated 12.2 years ago by Peter 6.0k • written 12.2 years ago by biocyberman ▴ 870

score 6 · Answer 1 · 2013-02-20

Seven questions in one - please ask more detailed specific questions if you need more advice, and/or sign up to the Biopython mailing list.

(1) Load EMBL records from a file.

Use Bio.SeqIO with format name "embl" to load EMBL sequence files, which will give you one SeqRecord object per record. As Istvan said, just edit those objects in memory by updating their attributes/properties, and then save them to disk using the Bio.SeqIO.write function.

(2) Update source information for the records like organism, project, mol_type

Update the qualifiers dictionary of the source feature SeqFeature object (typically the first feature of the record, which is a SeqRecord object).

(3) Update accession numbers

Probably just update the annotation dictionary and/or id of the record, depending which values exactly you are interested in.

(4) Update IDs

Probably just update the id attribute of the SeqRecord, i.e. set it to a new value - depending what you meant by ID.

(5) Update descriptions

Probably just update the description attribute of the SeqRecord, i.e. set it to a new value - depending what you meant by descriptions.

(6) Add references

Update the list of reference objects in the annotations dictionary of the SeqRecord, i.e. record.annotations["references"]

(7) Remove db_xref qualifiers in source and other features.

Each feature's db_xref attribute is a list which you can edit to remove an entry, or simple replace with an empty list.

score 2 · Answer 2 · 2013-02-19

2

Entering edit mode

12.2 years ago

Istvan Albert 102k

I did a Genbank file transformation a little while ago and that involved mutating SeqRecords.

There is no update but you can replace the attributes with the correct classes.

There are some subtle dependencies however, for example sub features take precedence over the content of the SeqFeature. Meaning that if you update the feature but not the sub_features then the latter will overwrite the former when it is serialized back.

ADD COMMENT • link 12.2 years ago by Istvan Albert 102k

2

Entering edit mode

The nasty sub_feature stuff only applied to features using join locations, and will be going away in Biopython 1.62 (next release).

ADD REPLY • link 12.2 years ago by Peter 6.0k

0

Entering edit mode

good to know - it was quite a head scratcher until I figured it out

ADD REPLY • link 12.2 years ago by Istvan Albert 102k

0

Entering edit mode

This doesn't sound straightforward. But anyway, thanks for your answers.

ADD REPLY • link 12.2 years ago by biocyberman ▴ 870

0

Entering edit mode

some of these are only require to update a dictionary - but you are right in that the implementation does not lend itself to changing the attributes.

Another possibility would be to create a genbank text or xml file with the desired data and parse that with biopython. The file formats are more rigorously defined than the internal workings of the classes.

ADD REPLY • link 12.2 years ago by Istvan Albert 102k