I am trying to parse genome records in EMBL format. Everything seems OK and without exception but parser only reads first record. Here is my code to parse EMBL file,
from Bio import SeqIO
for record in SeqIO.parse("AE000657.1.embl","embl"):
print record.id
Not sure what the issue is. Your file contains one sequence record and the code prints its ID, as expected. Maybe you want FT lines as suggested in Peter's answer?
In EMBL, each record starts with an "ID" line and ends with a // line, and your EMBL file as shown here does really only contain one record. The Biopython parser is therefore working as designed.
I would guess what you are looking for is the features, i.e. the information on the FT lines (Feature Table). These get parsed into SeqFeature objects in Biopython, held as a list as the features property of the SeqRecord object. Note for for single sequence files, you may find it simpler to use the read function:
from Bio import SeqIO
record = SeqIO.read("AE000657.1.embl","embl")
print "Record %s has %i features" % (record.id, len(record.features))
P.S. On BioStars (like StackExchange) you are expected to mark an answer as accepted if it solves your problem - this is used for the user profile ratings etc.
ADD REPLY
• link
updated 3.2 years ago by
Ram
44k
•
written 11.6 years ago by
Peter
6.0k
Could you edit your question to include a URL to the test file? Without that this isn't going to be easy to assist you with.
Yep I edited my question.
Not sure what the issue is. Your file contains one sequence record and the code prints its ID, as expected. Maybe you want FT lines as suggested in Peter's answer?