Here is the relevant part of the function:
flanks = open("flanks" + str(organism) + ".fasta", "w")
#[...]
for record in flanks:
if len(record.seq)<200:
print record.id, str(len(record.seq))+"bp", "unflanked element"
This returns: NT_086568.2127 132bp unflanked element
However, if I find the relevant ID in the file called by hits I find an undeniably 200bp FASTA sequence in its usual format amongst hundreds of other 200bp FASTA sequences that are never mentioned.
If I alter the script to also return record.seq:
for record in hits:
if len(record.seq)<200:
print record.id, record.seq str(len(record.seq))+"bp", "unflanked element"
And then ctrl+f this sequence in the relevant file I see this: http://i.imgur.com/eIE1Yqb.png
I think this shows the sequence isn't 200bp.
The first bit of code you posted can't be correct. You're iterating over lines in a file that's empty (you've just opened it for writing). I assume it should have been something like
I cut out the rest. I said the file is full of 200bp sequences when I open it in a text editor so why would I be lying? I just don't want anyone to have to read through the other stuff. I had also edited some of the variable names.
Here's the entire function. The relevant part is "for record in sltrs:" onwards.
I'm now also getting the issue of the function not correctly reading the number of results in my fullflanks(organism).fasta file. If I use len(list(SeqIO.parse([...] outside of the function, I get the correct number (16). If I use it inside the function, I get 0 even though I can open the fasta file and see that it has 16 results in it.
At the very least, try closing
flanks
before opening it again, since it's likely that the contents haven't been flushed to disk yet.Okay, I used flanks.close() and fullflanks.close() after the entire flank retrieval part and they ended up being read correctly by SeqIO.parse afterwards. I'm not getting any 200bp sequences being read incorrectly and I get a length of 16 for the fullflanks list.
Thanks a lot!