Hi all,
I want to determine length of individual sequences in a multifasta file. I got this biopython code from the bio manual as:
from Bio import SeqIO
import sys
cmdargs = str(sys.argv)
for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"):
output_line = '%s\t%i' % \
seq_record.id, len(seq_record))
print(output_line)
My input file is like:
>Protein1
MNT
>Protein2
TSMN
>Protein3
TTQRT
And the code yields:
Protein1 3
Protein2 4
Protein3 5
But I want to calculate the length of a sequence after adding the length of previous sequences. It would be like:
Protein1 1-3
Protein2 4-7
Protein3 8-12
I don't know in which of the above line in the code I need to change to get that output. I'd appreciate any help on this issue, thanks!!!!
I think he is adding the previous length 3+1 7+1 but this is also not clear to me where is this 1 came from
It could also be that he is creating a begin and end 'position' for each sequence, but since it's unclear I prefer to ask rather than assuming something :p