Hi,
I'm new to biopython. I can't seem to get nested for loops to iterate properly. Here's a simple example:
from Bio import SeqIO
infile = file('testseq.fna')
midfile = file('mids.fna')
c = 0
for midseq,line in SeqIO.parse(midfile,"fasta"):
print midseq.id
print midseq.seq
for line in SeqIO.parse(infile,"fasta"):
print line.seq
I have 12 simple fasta records in testseq.fna, and 96 mid identifiers in mids.fna. I should get a list of 96 mid ids and seqs, each followed by 12 testseq sequences, but what I get is just the first mid and sequences then just the other mids with no sequence... run it and you will see what I mean. I'm pulling my hair out - why doesn't Python run the 'line' loop for each 'mid' loop like it should??
Thanks for any help - I know its surprising but I couldn't find an answer to this anywhere (on python forum they were just rude!).
Theo
when you open a file you open a stream to it, once that runs out you would either need to go back to the beginning with a seek operation or just open the file in a new stream. Each time you open the file it is an entirely new stream to the same content - you can be in different locations of the same file if you open it in different streams.
Essentially what's going on here is that a file acts more like an iterator than like a list. Try running through an iterator (made by something like
iterator = iter([1,2,3])
) in a loop multiple times(likefor i in range(3): for x in iterator: print x
, and you'll see that it only runs through the 1,2,3 items in the first inner for loop, and acts as empty after that - unlike a list, which would act the same in every inner for loop. But you're right, the python manual isn't very explicit about that.