I'm trying to write a python script that uses a sliding window. Here is the code:
v = open("ex.fasta", "r")
def sliding_window(sequence, winSize, step):
numOfChunks = ((len(sequence)-winSize)/step)+1
for i in range(0,numOfChunks*step,step):
yield sequence[i:i+winSize]
size = int(14786)
w = 500
while size > w:
for line in v:
if not line.startswith(">"):
myseq = line.rstrip()
myvect = sliding_window(myseq, 500, 500)
for r in myvect:
print(r)
I want it to be able to produce chunks of the sequence in window sizes of 500, with a step size of 500, i.e. no overlap. However, the trouble I'm having is the lines for the fasta file are 76bp long for all lines except the last is ~40bp. Choosing anything <= 76 it will produce the desired outcome. Anything > 76 does not work. I've tried creating a string (using the whole sequence rather than a list, but it still does not work. Any help is appreciated.
I really didn't get your question but you can concatenate multiple lines to make a complete sequence. See here:A: multiline fasta to single line fasta
Thanks Ashutosh, but i'm in the middle of learning python. I'd like to stay away from one-liners and perl, and keep it all to a single script.