I'm trying to do a program that's kinda complicated to explain, so I'll try my best to elaborate. What I am trying to do is twofold: first, I want to get all the possible mutated strands from an original strand that I gave a mutation frequency of 0.66% (this is for all bases). Then, I want to make a dictionary of dictionaries representing these mutations. I have two bits of code so far, both getting some aspect of my objective done, but I'd prefer if they are combined somehow. The first yields a list of all possible mutated strands from the original one, but it doesn't have a mutation frequency - it only changes on base at a time. Also note that the only bases I'm changing here is A-->T, but that's only for simplicity's sake - I'd like to be able to mutate any base:
import random
#Yields all possible strands with a given mutation
def print_all_possibilities(dna, mutations, index = 0):
if index < 0: return #invalid value for index
while index < len(dna):
if chr(dna[index]) in mutations: #chr converts ASCII value to corresponding character; If the function finds a place where a mutation can happen
print_all_possibilities(dna, mutations, index + 1)
dnaCopy = bytearray(dna) #bytearray object is mutable, while a string isn't. So, I decided to convert the string to a byte array object
dnaCopy[index] = ord(mutations[chr(dna[index])])#...then it copys the dna and lets the second one mutate.
print_all_possibilities(dnaCopy, mutations, index + 1)
return
index += 1
print(dna.decode("ascii")) #decodes to ascii - the character version of the dna (instead of numeric)
# for testing
print_all_possibilities(bytearray(b"AAAATTTT"), {"A": "T"})
My second code does mutate a strand at a fixed mutation frequency, but it only yields a single strand, rather than all possibilities:
#yields a strand with an A-T mutation frequency of 0.6%
def mutate(string, mutation, threshold):
dna = list(string)
for index, char in enumerate(dna):
if char in mutation:
if random.random() < threshold:
dna[index] = mutation[char]
return ''.join(dna)
dna = "ATGTCGTACGTTTGACGTAGAG"
print("DNA first:", dna)
newDNA = mutate(dna, {"A": "T"}, 0.006)
print("DNA now:", newDNA)
The reason why I'm trying to combine the two bits of code is because I want all the possible strands yielded from that fixed mutation frequency, and the separate pieces of code doesn't take care of both those parameters.
Furthermore, after I get a list of all the possible mutated strands from that mutation frequency, I'd like to make a dictionary of dictionaries: each strand will have a dictionary bearing a value-key pair of "original base:new mutated base". I'm not sure how to go about these things. Could someone help me out? Thanks.
It's hard to see exactly what you're aiming for here. "All possible strands" with a given mutation rate would be all possible strands full stop (even if some are very unlikely indeed).
Also not sure I can help but not sure I understand why you would want this as well.
We are finding that rare variants and private SNPs can be very important. But finding several very rare private SNPs together would be a vanishingly unlikely event.
What I mean is, in a recent paper on ITGAM, Vyse et al. showed two rare coding variants that produce functional effects on ITGAM and may contribute to Autoimmune disease.
They then looked for these causal variants in 8700 other people (that number is from memory but should not be too far from accurate), finding them in no one else.
So, if we generate all possible permutations of rare variants, we might generate some causal ones, but then this has already been done through CADD for one SNP at a time ... so your contribution would be to have these in triplets or pairs, etc. (if I understand correctly) but if we recognize that some causal variants for both complex and penetrant Mendelian disease will be very rare, we might generate a lot of permutated haplotypes that in fact only ever occur in one or two people...
Anyway I'd be interested to know how you feel this would help - not saying it wouldn't just saying I do not understand the aim.