Hi guys:
I have this GenBank file:
LOCUS sctg_0006_0001 172997 bp DNA UNK 01-JAN-1980
DEFINITION sctg_0006_0001 length=172997
ACCESSION sctg_0006_0001
VERSION sctg_0006_0001
KEYWORDS .
SOURCE .
ORGANISM .
FEATURES Location/Qualifiers
CDS <3..182
/note="ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.722;conf=99.97;score=35.07;cscore=31.85;sscore=3.22;rscore=0.00;uscore=0.00;tscore=3.22;"
CDS 372..1145
/note="ID=1_2;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.755;conf=100.00;score=149.21;cscore=143.89;sscore=5.32;rscore=-0.60;uscore=1.69;tscore=4.88;"
CDS[Many Many More]...
And as you can see it has the features and their respective location and a qualifier note
. What I'm trying to do in to add a new qualifier called locus_tag
to each CDS in this big file.
I have written this code, but I'm getting some problems:
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.SeqRecord import SeqRecord
annotation_handle = open("/Users/jcastrof/Desktop/prueba/prueba_str.gbk","rU")
for record in SeqIO.parse(annotation_handle,"genbank"):
a = len(record.features)
for_rast = open("/Users/k/Desktop/prueba/contig_for_rast.gbk","w")
for x in range(0, a):
locus_tag = {"locus_tag":"%s_%s" % record.id,x+1)}
new_record = (SeqFeature(qualifiers = locus_tag))
record.features.append(new_record)
SeqIO.write(record, for_rast, "genbank")
for_rast.close()
And I've got this error:
Traceback (most recent call last):
File "/Users/k/Desktop/add_tag_locus.py", line 32, in <module>
SeqIO.write(record, for_rast, "genbank")
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 426, in write
count = writer_class(fp).write_file(sequences)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 254, in write_file
count = self.write_records(records)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 239, in write_records
self.write_record(record)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/InsdcIO.py", line 775, in write_record
self._write_feature(feature, rec_length)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/InsdcIO.py", line 305, in _write_feature
assert feature.type, feature
AssertionError: type:
location: None
qualifiers:
Key: locus_tag, Value: sctg_0006_0001_1
What would you suggest? (please try to help me out :D ). Thanks!
I remember that using Artemis ( http://www.sanger.ac.uk/resources/software/artemis/) you can load your Genbank file and add qualifiers (such as locus_tag) to all or specifically filtered features (for example easily to all CDS). Maybe this could save you some work.
Thanks, I'll try it. But what I need is to do this for many files.
I think it will be fine for "many" as in 5 to 10, but if it's more around 200 you might have to get back to another solution.