Entering edit mode
9.5 years ago
allyson1115ar
▴
30
The code below extract short sequence in every sequence with the window size 100. The window will shift by step size one and extract the sequence. I would like to extract the short sequence with every step size 50. Can anyone help me?
from Bio import SeqIO
with open("B.fasta","w") as f:
for seq_record in SeqIO.parse("A.fasta", "fasta"):
for I in range(len(seq_record.seq) - 99) :
f.write(str(">"+seq_record.id) + "\n")
f.write(str(seq_record.seq[i:i+100]) + "\n")
Example of fasta file:
>hg17_ct_ER_ER_142 CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGGG
Example output:
>hg17_ct_ER_ER_142 CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGG
>hg17_ct_ER_ER_142 TAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGG
>hg17_ct_ER_ER_142 AAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGGG
Expected output:
>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACA
>hg17_ct_ER_ER_142
AGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGG
You can actually specify in the step size in the range function. The full range function takes in: start, end, step arguments.
For example:
Hi !
Have you considered using a "while" loop instead of your second "for" loop ? Just like :
It solved perfectly. Thanks for reply. But there's one issue there are few base pairs left out in this case.
My fault, I've edited my last post.
the codes are not equivalent - the original code generates overlapping window shifted by one base, this code creates non-overlapping windows that are shifted by a window size.
I agree. However, I tried to fit the expected output.