Entering edit mode
10.4 years ago
Ali
▴
140
I want to design a probe, a short DNA sequence, with no homology in a target sequence. By homology I mean the designed sequence has no similarity in the target genome with a few mutations. The size of probe and number of mutations are considered as parameters.
A solution to this would be to generate a pool of random DNA sequences, and align them to the reference genome allowing mismatches, and looking for one that has no alignment.
Does anybody have a better solution?
Could you instead just use one of the standard epitopes (e.g., an HA-tag)?
Great solution with biological insight! Thanks.
You can compute a statistics of n-gramm (k-mer) occurrence in the target sequence and design a probe using those k-mers which are not seen at all when using the edit distance (nullomers?). This way there will be no randomness and alignment involved - i.e. exact solution.
Thanks Pavel, I thought about the idea earlier, but I guessed even having k-mers with no occurrence it seems to be an NP-hard problem to find the farthest (or even far enough) sequence. More specifically we are given a set of k-mers, we want to find another k-mer which has edit distance at least x to any k-mer in the list. Am I true?
I guess that statistics computation has a linear complexity (the whole sequence length), moreover, you'll have a position-specific stat after that. Then you can use the dynamic programming which will yield an optimal solution in pseudo-polynomial (to the probe length) time. I might be wrong.