Hi there! :)
I have a question for you all. I am trying to study the local base composition of a DNA sequence by using python. If you don't know what I am talking about, don't worry here I explain to you:
Imagine that you have this DNA sequence:
g a g t t t t a t c g c g c t t c c a t g
And you want to know how many a, c, g and t are in a part of it (window) and you want to repeat this process in the whole sequence but with a certain offset. So, this is a sum up of what you will have:
So, at the end, you will have the base composition of each subgroup you have made from the beginning sequence.
This is what I am trying to do in python. Here is my code:
def composicionBasesLocal(seq, window_len = 200, offset = 100, circular = False):
lowest = 0
highest = window_len
res = []
while highest<=len(seq)-1:
window = seq[lowest:highest+1]
if lowest<= len(seq):
mm = ModeloMultinomial(window)
res.append(mm)
else:
break
lowest = lowest + offset
highest = highest + offset
return(res)
ModeloMultinomial(seq) code:
def ModeloMultinomial(seq):
ModMul = []
pa = seq.count('A')/len(seq)
pc = seq.count('C')/len(seq)
pg = seq.count('G')/len(seq)
pt = seq.count('T')/len(seq)
ModMul.append([pa,pc,pg,pt])
return(['pa','pc','pg', 'pt'], ModMul)
This code (composicionBasesLocal) doesn't give me any message error but when I run it, it loops and I have to stopped it. I did it whit a for loop and it works without any problem.
What I have done wrong? Thank you!! :D
check for the indentation of the
lowest = lowest + offset highest = highest + offset
because you are in a infinite loop.