Entering edit mode
4.7 years ago
rjqmantaring
•
0
I'm pretty new to BioPython and I'm trying to use it to extract all of the CDS features from a .embl file. This is my code:
#!/usr/bin/python3.7
for rec in SeqIO.parse("file.embl", "embl"):
if rec.features:
for feature in rec.features:
if feature.type == "CDS":
print(feature.location)
print (feature.qualifiers["protein_id"])
print (feature.location.extract(rec).seq)
When I run my code I get the following error:
Traceback (most recent call last):
File "extractor.py", line 5, in <module>
record = SeqIO.read("file.embl", "embl")
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 720, in read
first = next(iterator)
File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 655, in parse
for r in i:
File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 489, in parse_records
record = self.parse(handle, do_features)
File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 473, in parse
if self.feed(handle, consumer, do_features):
File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 440, in feed
self._feed_first_line(consumer, self.line)
File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 661, in _feed_first_line
raise ValueError('Did not recognise the ID line layout:\n' + line)
ValueError: Did not recognise the ID line layout:
ID file ; ; ; ; ; 29902 BP.
I can't seem to find any relevant documentation or forum post on that specific error message. Can anyone help me figure out what's going on?
Thanks in advance.
Is this the first line of your file?
Extracting more features from EMBL files with Biopython
Problem With Parsing Genome File - Embl Format - With Biopython
Yes. Its an embl file that as generated by transferring annotations from a GenBank file to an unannotated FASTA.
There should be 2 or 3 or 6 semicolons (there's 5 in your header).
Here is the part of the script that generates the error:
thanks, I'll try looking into this.