Finding a genetic sequence with no homology in a target genome

1

Entering edit mode

10.9 years ago

Ali ▴ 140

I want to design a probe, a short DNA sequence, with no homology in a target sequence. By homology I mean the designed sequence has no similarity in the target genome with a few mutations. The size of probe and number of mutations are considered as parameters.

A solution to this would be to generate a pool of random DNA sequences, and align them to the reference genome allowing mismatches, and looking for one that has no alignment.

Does anybody have a better solution?

alignment sequence • 2.7k views

ADD COMMENT • link updated 10.9 years ago by Biostar 20 • written 10.9 years ago by Ali ▴ 140

1

Entering edit mode

Could you instead just use one of the standard epitopes (e.g., an HA-tag)?

ADD REPLY • link 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

Great solution with biological insight! Thanks.

ADD REPLY • link 10.9 years ago by Ali ▴ 140

0

Entering edit mode

You can compute a statistics of n-gramm (k-mer) occurrence in the target sequence and design a probe using those k-mers which are not seen at all when using the edit distance (nullomers?). This way there will be no randomness and alignment involved - i.e. exact solution.

ADD REPLY • link 10.9 years ago by Pavel Senin ★ 1.9k

0

Entering edit mode

Thanks Pavel, I thought about the idea earlier, but I guessed even having k-mers with no occurrence it seems to be an NP-hard problem to find the farthest (or even far enough) sequence. More specifically we are given a set of k-mers, we want to find another k-mer which has edit distance at least x to any k-mer in the list. Am I true?

ADD REPLY • link 10.9 years ago by Ali ▴ 140

0

Entering edit mode

I guess that statistics computation has a linear complexity (the whole sequence length), moreover, you'll have a position-specific stat after that. Then you can use the dynamic programming which will yield an optimal solution in pseudo-polynomial (to the probe length) time. I might be wrong.

ADD REPLY • link 10.9 years ago by Pavel Senin ★ 1.9k

Login before adding your answer.