Add A New Feature In Biopython
2
3
Entering edit mode
12.0 years ago
DF ▴ 110

Hi, I would like to overwrite a feature in a genbank file using BioPython, using ambiguous locations.

For example, I can set an ambiguous location:

from Bio import SeqFeature
start_pos = SeqFeature.AfterPosition(5)
end_pos = SeqFeature.BeforePosition(10)
my_location = SeqFeature.FeatureLocation(start_pos, end_pos)
print my_location

Result:

[>5:<10]

Which is correct.

What I can't work out how to do is to use this ambiguous location to add a new feature to a sequence.

I can read in a genbank file, assign the records to a SeqRecord, and assign the genbank features to SeqFeatures :

for record in SeqIO.parse(myinfile, "genbank"):
   new_gb = SeqRecord(record.seq)
   new_gb.features=record.features

Then I can overwrite a feature:

new_gb.features[0]=SeqFeature(FeatureLocation(5,10), type="random_feature_type")

which, when I print the SeqRecord out in genbank format (to stdout), correctly overwrites the first feature:

out_handle = StringIO()
SeqIO.write(new_gb, out_handle, "genbank")
new_gb_output = out_handle.getvalue()
print new_gb_output

What I am stuck on is :

1) How to I add a completely new feature to a SeqRecord?

2) How can I use ambiguous locations in the location of a feature in a SeqRecord?

I'm sure this is simple but I can't seem to work it out.

Thanks and I'm sorry if this is stupid, I'm learning.

biopython genbank feature • 11k views
ADD COMMENT
2
Entering edit mode

*features starts out as an empty list, you should be able to add to is with *features.append(x)?

Also: if you want to save some time on formatting the record to genbank you can use print(record.format("gb"))

ADD REPLY
0
Entering edit mode

Thanks, yes, basically silly by me, staring at a screen too long and using a SeqRecord when I wanted a SeqFeature. Worked out the use of ambiguous locations too.

ADD REPLY
0
Entering edit mode

Cool -you should post your solutions as an answer so future googlers can see how do it

ADD REPLY
1
Entering edit mode

I'm working on an idiot's guide to creating and adding a new feature to a sequence, will post it as an answer to this post once it's done.

ADD REPLY
7
Entering edit mode
12.0 years ago
DF ▴ 110

Here's a small snippet of code for people (like myself) struggling through BioPython. I'm writing code to fix some annotations that are improperly annotated as being exact when they should really be open-ended (i.e. it's not certain where the feature actually starts). This bit was easy, but I got totally stuck for hours on actually adding the fixed annotation to a new or existing genbank file.

So here's a step-by-step guide (in runnable python code) to creating a new SeqRecord, annotating it with a SeqFeature, and then overwriting the SeqFeature in Biopython (I'm using python 2.7). Obviously this can be re-jigged to import & mess with existing genbank files. I wrote it for ease of understanding, not concise code, so sorry if it's a bit verbose. Simple for some people I know, but hopefully someone finds this useful.

################ A: Make a SeqRecord ################

# 1. Create a sequence

from Bio.Seq import Seq
my_sequence = Seq("GATCGATCGATCGATCGATCGATCGATCGATC")

# 2. Create a SeqRecord and assign the sequence to it

from Bio.SeqRecord import SeqRecord
my_sequence_record = SeqRecord(my_sequence)

# 3. Assign an alphabet to the sequence (in this case DNA)

from Bio.Alphabet import generic_dna
my_sequence_record.seq.alphabet = generic_dna

# This is the minimum required info for BioPython to be able to output 
# the SeqRecord in Genbank format.
# You probably would want to add other info (e.g. locus, organism, date etc)

#optional: print the SeqRecord to STDOUT in genbank format.. note there are no features on it yet.
print "\nThis bit is the SeqRecord, printed out in genbank format, with no features added.\n"
print(my_sequence_record.format("gb"))

################ B: Make a SeqFeature ################

# 1. Create a start location and end location for the feature
#    Obviously this can be AfterPosition, BeforePosition etc.,
#    to handle ambiguous or unknown positions

from Bio import SeqFeature
my_start_pos = SeqFeature.ExactPosition(2)
my_end_pos = SeqFeature.ExactPosition(6)

# 2. Use the locations do define a FeatureLocation
from Bio.SeqFeature import FeatureLocation
my_feature_location = FeatureLocation(my_start_pos,my_end_pos)

# 3. Define a feature type as a text string 
#     (you can also just add the type when creating the SeqFeature)
my_feature_type = "CDS"

# 4. Create a SeqFeature
from Bio.SeqFeature import SeqFeature
my_feature = SeqFeature(my_feature_location,type=my_feature_type)

# 5. Append your newly created SeqFeature to your SeqRecord

my_sequence_record.features.append(my_feature)

#optional: print the SeqRecord to STDOUT in genbank format, with your new feature added.
print "\nThis bit is the SeqRecord, printed out in genbank format, with a feature added.\n"
print(my_sequence_record.format("gb"))

################ C: Overwrite an existing SeqFeature ################

# 1. Create a start location and end location for the feature.. 
#    This bit is obviously a repeat of "B: Make a SeqFeature" above, 
#    normally I'd pull it out to a function, but I'm trying to be explicit here

from Bio import SeqFeature
my_start_pos = SeqFeature.ExactPosition(3)
my_end_pos = SeqFeature.ExactPosition(7)

# 2. Use the locations do define a FeatureLocation
from Bio.SeqFeature import FeatureLocation
my_feature_location2 = FeatureLocation(my_start_pos,my_end_pos)

# 3. Define a feature type as a text string 
#    (or you can also just add the type when creating the SeqFeature)
my_feature_type2 = "ABC"

# 4. Create a SeqFeature
from Bio.SeqFeature import SeqFeature
my_feature2 = SeqFeature(my_feature_location2,type=my_feature_type2)

my_sequence_record.features[0]=my_feature2

#optional: print the SeqRecord to STDOUT in genbank format, with your new feature changed.
print "\nThis bit is the SeqRecord, printed out in genbank format, with a feature changed.\n"
print(my_sequence_record.format("gb"))
ADD COMMENT
0
Entering edit mode

Thank you! You just saved me a ton of time today.

ADD REPLY
0
Entering edit mode

The code snippet provided uses multiple imports and overwrites the meaning of SeqFeature, that confused me to no end. That is, don't do:

from Bio import SeqFeature
...
from Bio.SeqFeature import SeqFeature
...

That's really, really bad.

Here's a solution suggested on the BioPython mailing list (with some more code around):

from Bio import SeqFeature as sf

spos = sf.BeforePosition(20);
epos = sf.ExactPosition(50);
l1 = sf.FeatureLocation(spos, epos, strand=+1)
l2 = sf.FeatureLocation(epos+100, epos+200, strand=+1)
cl= sf.CompoundLocation([l1, l2]);
f1 = sf.SeqFeature(cl, strand=1, type="CDS", qualifiers={});
ADD REPLY
0
Entering edit mode

Thanks for posting the final summary. Seven years later, and it's still relevant and helpful. thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2055 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6