I currently have a script that can read a Fasta file, and print out the lengths of the constituent sequences. However, I'm having difficulties adapting it to both iteratively process a large quantity of files using os (specifically, it seems to stop at one file for some reason), and write the output to corresponding text files. Could anyone kindly assist with this issue?
header = None
length = 0
with open('x.ffn') as input_file:
for line in input_file:
line = line.rstrip()
if line.startswith('>'):
if header is not None:
print(header, length)
length = 0
header = line[1:]
else:
length += len(line)
if length:
print(header, length)
as Corentin mentioned
samtools faidx
is the simplest and fastest method to get the information,you could also use bioawk like so:
prints:
as for your program, you should mention what the error is and what does it mean that it "stops for some reason"?