Hi!
For example, if I have a file with 3 sequences:
>1
gtccgatgat
>2
gatggatcggacgttagcaa
>3
acatttgaga
I want to generate sequences with 10% of the nucleotides randomly mutated.
>1
gtcTgatgat
>2
gaAggatcggacgtGagcaa
>3
acatCtgaga
Does anybody know a program or code to do that?
And if you know a way to do this in python would be great!
Thanks!
on 1) i dont disagree but favor simplicity over performance unless it proves a problem. For 2) it probably depends on the mutation rate which is more efficient. And for 3) I guess that depends on what "mutation rate" means.
actually, you were right. I changed it to s.upper() :) ... thanks.
nice! just one few suggestions, 1)choice is really slower than random, as we only want to get one element the corresponding line could be changed using 'ATGC'[int(random()4)] 2) seq can be let as string, seq = seq[i:] + 'ATGC'[int(random()4)] + seq[:i+1]... it is perhaps more efficient 3)why should the mutation result in a different nucleotide?
Nice code brentp! thanks !!!
I've learned a lot from this code... you really help me brentp :)
Am I mistaken to believe this will only give a mutation rate equivalent to 3/4 of the one you specify? Namely, 1 out of 4 of your mutations will be mutations to the same nucleotide, thus not being a mutation? Maybe a while loop would correct for that easily. (while new == old: pick new) or something.
@Eric, yes, you are mistaken. See @fransua's comment. When the threshold is exceeded. It changes the base. That's the part about: choice([x for x in "ACTG" if not x == s.lower()])
I was not sure I understood your list comprehension at first either. Now I see. Thanks
Ups... I didn't saw the error when I analyzed the script. Thanks for the correction!!
This is great. Sorry, Im new to phyton ... how would you go about to modify the script to directly introduce, say, 2 mutations? (I mean, not depending on a mutation frequency, but telling the script to introduce 2 mutations in each given sequence regardless of size). Thanks!