What's the fastest way to degenerate exact DNA sequences (motifs of length L) using IUPAC ambiguity code. In other words, how to obtain all the possible IUPAC motifs from DNA exact motifs. I search for an efficient algorithm.
For example
I have these motifs of length 3 :
- AAA
- ATA
- TTA
- TAA
- GAA
- CAA
And I have to obtain:
- AWA (AAA + ATA)
- WWA (AAA + ATA + TAA + TTA)
- RAA (AAA + GAA)
- WAA (AAA + TAA)
- YAA (CAA + TAA)
- NAA (AAA + TAA + CAA + GAA)
- ...
thank you, can you give an example please ?
It seems that the table markdown doesn't render nicely. Anyway, there are many similar approaches, depending on your exact needs.
but it means that each position is studied separately (a lookup table for each position), no ?
There's only one lookup table. Yes, each position is set separately, but that's not a problem and you can then generate these for arbitrary lengths efficiently.