Question

Making a Dictionary for all Possible Mutations

0

Entering edit mode

10.5 years ago

abenabba • 0

I'm trying to do a program that's kinda complicated to explain, so I'll try my best to elaborate. What I am trying to do is twofold: first, I want to get all the possible mutated strands from an original strand that I gave a mutation frequency of 0.66% (this is for all bases). Then, I want to make a dictionary of dictionaries representing these mutations. I have two bits of code so far, both getting some aspect of my objective done, but I'd prefer if they are combined somehow. The first yields a list of all possible mutated strands from the original one, but it doesn't have a mutation frequency - it only changes on base at a time. Also note that the only bases I'm changing here is A-->T, but that's only for simplicity's sake - I'd like to be able to mutate any base:

import random

    #Yields all possible strands with a given mutation 
    def print_all_possibilities(dna, mutations, index = 0):
        if index < 0: return #invalid value for index

        while index < len(dna):
            if chr(dna[index]) in mutations: #chr converts ASCII value to corresponding character; If the function finds a place where a mutation can happen
                print_all_possibilities(dna, mutations, index + 1)
                dnaCopy = bytearray(dna) #bytearray object is mutable, while a string isn't. So, I decided to convert the string to a byte array object
                dnaCopy[index] = ord(mutations[chr(dna[index])])#...then it copys the dna and lets the second one mutate. 
                print_all_possibilities(dnaCopy, mutations, index + 1)

                return
            index += 1
        print(dna.decode("ascii")) #decodes to ascii - the character version of the dna (instead of numeric)

    # for testing
    print_all_possibilities(bytearray(b"AAAATTTT"), {"A": "T"}) 

My second code does mutate a strand at a fixed mutation frequency, but it only yields a single strand, rather than all possibilities:

    #yields a strand with an A-T mutation frequency of 0.6%
    def mutate(string, mutation, threshold):
        dna = list(string)
        for index, char in enumerate(dna):
            if char in mutation:
                if random.random() < threshold:
                    dna[index] = mutation[char]

        return ''.join(dna)

    dna = "ATGTCGTACGTTTGACGTAGAG"
    print("DNA first:", dna)
    newDNA = mutate(dna, {"A": "T"}, 0.006)
    print("DNA now:", newDNA)

The reason why I'm trying to combine the two bits of code is because I want all the possible strands yielded from that fixed mutation frequency, and the separate pieces of code doesn't take care of both those parameters.

Furthermore, after I get a list of all the possible mutated strands from that mutation frequency, I'd like to make a dictionary of dictionaries: each strand will have a dictionary bearing a value-key pair of "original base:new mutated base". I'm not sure how to go about these things. Could someone help me out? Thanks.

dna sequencing • 3.5k views

ADD COMMENT • link updated 3.1 years ago by Ram 44k • written 10.5 years ago by abenabba • 0

0

Entering edit mode

It's hard to see exactly what you're aiming for here. "All possible strands" with a given mutation rate would be all possible strands full stop (even if some are very unlikely indeed).

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by David W 4.9k

0

Entering edit mode

Also not sure I can help but not sure I understand why you would want this as well.

We are finding that rare variants and private SNPs can be very important. But finding several very rare private SNPs together would be a vanishingly unlikely event.

What I mean is, in a recent paper on ITGAM, Vyse et al. showed two rare coding variants that produce functional effects on ITGAM and may contribute to Autoimmune disease.

They then looked for these causal variants in 8700 other people (that number is from memory but should not be too far from accurate), finding them in no one else.

So, if we generate all possible permutations of rare variants, we might generate some causal ones, but then this has already been done through CADD for one SNP at a time ... so your contribution would be to have these in triplets or pairs, etc. (if I understand correctly) but if we recognize that some causal variants for both complex and penetrant Mendelian disease will be very rare, we might generate a lot of permutated haplotypes that in fact only ever occur in one or two people...

Anyway I'd be interested to know how you feel this would help - not saying it wouldn't just saying I do not understand the aim.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by vlaufer ▴ 290

Ram · Answer 1 · 2014-06-04

I guess what you are trying to do is to first generate all possible strands by flipping the bases and then mutating individual strands based on the mutation frequency because you cannot generate all possible mutated strands when using a mutation frequency of 0.66%. So either you generate all the strands by just flipping all the bases or use the mutation frequency to determine which base will come up at a position and then generate a fixed number (decided by you) of strands.

So you can write one simple function, where a loop generate new strands by flipping the bases and there in you mutate that strand and append to same/separate list as your final result. Also an iterative function could be a lot easier and simpler to understand.