Question

Biopython Embl Parser Only Reads One Entry

0

Entering edit mode

11.6 years ago

sinanugur ▴ 10

Hello,

I am trying to parse genome records in EMBL format. Everything seems OK and without exception but parser only reads first record. Here is my code to parse EMBL file,

from Bio import SeqIO

for record in SeqIO.parse("AE000657.1.embl","embl"):
        print record.id

This script only returns:

AE000657.1.

That is all, the other genomic regions are not printed. The link of the file is this one: http://www.ebi.ac.uk/ena/data/view/AE000657&display=text

EMBL file is OK and in fact it can be opened by Artemis. Thus, it is not a corrupted file. So what is the problem here? Thanks

python biopython • 3.6k views

ADD COMMENT • link updated 3.2 years ago by Ram 44k • written 11.6 years ago by sinanugur ▴ 10

1

Entering edit mode

Could you edit your question to include a URL to the test file? Without that this isn't going to be easy to assist you with.

ADD REPLY • link 11.6 years ago by Peter 6.0k

0

Entering edit mode

Yep I edited my question.

ADD REPLY • link 11.6 years ago by sinanugur ▴ 10

1

Entering edit mode

Not sure what the issue is. Your file contains one sequence record and the code prints its ID, as expected. Maybe you want FT lines as suggested in Peter's answer?

ADD REPLY • link 11.6 years ago by Neilfws 49k

Ram · Accepted Answer · 2013-05-13

9

Entering edit mode

11.6 years ago

Peter 6.0k

In EMBL, each record starts with an "ID" line and ends with a // line, and your EMBL file as shown here does really only contain one record. The Biopython parser is therefore working as designed.

I would guess what you are looking for is the features, i.e. the information on the FT lines (Feature Table). These get parsed into SeqFeature objects in Biopython, held as a list as the features property of the SeqRecord object. Note for for single sequence files, you may find it simpler to use the read function:

from Bio import SeqIO
record = SeqIO.read("AE000657.1.embl","embl")
print "Record %s has %i features" % (record.id, len(record.features))

ADD COMMENT • link updated 3.2 years ago by Ram 44k • written 11.6 years ago by Peter 6.0k

0

Entering edit mode

Thanks, I wanted to parse features. I thought I can iterate through those features via SeqIO.parse but now I get that. Cheers.

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 11.6 years ago by sinanugur ▴ 10

0

Entering edit mode

Great.

P.S. On BioStars (like StackExchange) you are expected to mark an answer as accepted if it solves your problem - this is used for the user profile ratings etc.

ADD REPLY • link updated 3.2 years ago by Ram 44k • written 11.6 years ago by Peter 6.0k