I am trying to modify the location of features within a GenBank file. I know feature.type
will give gene/CDS and feature.qualifiers
will then give "db_xref"/"locus_tag"/"inference" etc. Is there a feature.
object which will allow me to access the location (eg: [5240:7267](+)
) directly?
This URL give a bit more info, though I can't figure out how to use it for my purpose... http://biopython.org/DIST/docs/api/Bio.SeqFeature.SeqFeature-class.html#location_operator
Essentially, I want to modify the following bit of a GenBank file:
gene 5240..7267
/db_xref="GeneID:887081"
/locus_tag="Rv0005"
/gene="gyrB"
CDS 5240..7267
/locus_tag="Rv0005"
/inference="protein motif:PROSITE:PS00177"
...........................
to
gene 5357..7267
/db_xref="GeneID:887081"
/locus_tag="Rv0005"
/gene="gyrB"
CDS 5357..7267
/locus_tag="Rv0005"
/inference="protein motif:PROSITE:PS00177"
.............................
Note the changes from 5240 to 5357
So far I have the following python script:
from Bio import SeqIO
gb_file = "mtbtomod.gb"
gb_record = SeqIO.parse(open(gb_file, "r+"), "genbank")
rvnumber = 'Rv0005'
newstart = 5357
final_features = []
for record in gb_record:
for feature in record.features:
if feature.type == "gene":
if feature.qualifiers["locus_tag"][0] == rvnumber:
if feature.location.strand == 1:
# Amend feature location from current to 'newstart'
else:
# do the reverse for the complementary strand
final_features.append(feature)
record.features = final_features
with open("test.gb","w") as test:
SeqIO.write(record, test, "genbank")
Rv0005 is just an example of a locus_tag I need to update. I have about 600 more locations to update per GenBank file, and about 10-20 GenBank files to process (with more to come)
Cross-post from http://stackoverflow.com/questions/24636588/modify-location-of-a-genbank-feature
I guess that you didn't get an answer there. But the link and explanations over there can be useful.
Yes indeed - although the RE.type way is not totally appropriate and I thought there may be more GenBank centric people over here...