How Do I Loop Over Sequences With Biopython?
2
1
Entering edit mode
12.7 years ago
T ▴ 20

Hi,

I'm new to biopython. I can't seem to get nested for loops to iterate properly. Here's a simple example:

from Bio import SeqIO

infile = file('testseq.fna')

midfile = file('mids.fna')

c = 0


for midseq,line in SeqIO.parse(midfile,"fasta"):
    print midseq.id
    print midseq.seq
    for line in SeqIO.parse(infile,"fasta"):
        print line.seq

I have 12 simple fasta records in testseq.fna, and 96 mid identifiers in mids.fna. I should get a list of 96 mid ids and seqs, each followed by 12 testseq sequences, but what I get is just the first mid and sequences then just the other mids with no sequence... run it and you will see what I mean. I'm pulling my hair out - why doesn't Python run the 'line' loop for each 'mid' loop like it should??

Thanks for any help - I know its surprising but I couldn't find an answer to this anywhere (on python forum they were just rude!).

Theo

biopython • 5.8k views
ADD COMMENT
4
Entering edit mode
12.7 years ago

Your stream reaches the end of the file by the end of the first iteration.

Move the line:

infile = file('testseq.fna')

Inside the loop like so:

for midseq,line in SeqIO.parse(midfile,"fasta"):
    infile = file('testseq.fna')
ADD COMMENT
0
Entering edit mode
12.7 years ago
T ▴ 20

Brilliant! thanks. You have no idea how long that has taken to find out! I still don't know why it works though.

Is it that opening the file again for each primary loop resets the iteration? I can't find anywhere in a Python manual where it says you have to do that! If it was a list and not a file would it have to be defined in the loop like that too?

Thanks again,

Theo

ADD COMMENT
1
Entering edit mode

when you open a file you open a stream to it, once that runs out you would either need to go back to the beginning with a seek operation or just open the file in a new stream. Each time you open the file it is an entirely new stream to the same content - you can be in different locations of the same file if you open it in different streams.

ADD REPLY
0
Entering edit mode

Essentially what's going on here is that a file acts more like an iterator than like a list. Try running through an iterator (made by something like iterator = iter([1,2,3])) in a loop multiple times(like for i in range(3): for x in iterator: print x, and you'll see that it only runs through the 1,2,3 items in the first inner for loop, and acts as empty after that - unlike a list, which would act the same in every inner for loop. But you're right, the python manual isn't very explicit about that.

ADD REPLY

Login before adding your answer.

Traffic: 2032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6