Are there any programs available (Python, Perl, whatever), that can take a degenerate nucleotide sequence and translate it into its multiple possible oligos?
Any help would be appreciated.
Are there any programs available (Python, Perl, whatever), that can take a degenerate nucleotide sequence and translate it into its multiple possible oligos?
Any help would be appreciated.
This program should do the job:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define RUN(base) copy[index]=base; recurs(seq,copy,index+1,len)
static void recurs(const char *seq,char* copy,int index,const int len)
{
if(index==len)
{
fwrite(copy,sizeof(char),len,stdout);
fputc('\n',stdout);
}
else
{
switch(toupper(seq[index]))
{
case 'A':case 'T': case 'G': case 'C': RUN(seq[index]); break;
case 'N':RUN('A');RUN('T');RUN('G');RUN('C');break;
case 'W':RUN('A');RUN('T');break;
case 'S':RUN('G');RUN('C');break;
case 'B':RUN('T');RUN('G');RUN('C');break;
case 'D':RUN('A');RUN('T');RUN('G');break;
case 'H':RUN('A');RUN('T');RUN('C');break;
case 'V':RUN('A');RUN('G');RUN('C');break;
case 'K':RUN('G');RUN('T');break;
case 'M':RUN('A');RUN('C');break;
case 'R':RUN('A');RUN('G');break;
case 'Y':RUN('C');RUN('T');break;
default: fprintf(stderr,"Bad base in %s (%c)\n",seq,seq[index]); exit(EXIT_FAILURE);break;
}
}
}
int main(int argc,char** argv)
{
char* seq;
int len,i;
if(argc!=2)
{
fprintf(stderr,"Usage : %s <dna>",argv[0]);
return EXIT_FAILURE;
}
seq=argv[1];
len=strlen(seq);
char* copy=malloc((len+1)*sizeof(char));
if(copy==NULL)
{
fprintf(stderr,"Out of memory\n");
exit(EXIT_FAILURE);
}
copy[len]='\0';
recurs(seq,copy,0,len);
free(copy);
return 0;
}
Compilation:
gcc -O3 -o prg prg.c
test
> ./prg ATGCTGATCGAGCTANATCGATCGGACTACY
ATGCTGATCGAGCTAAATCGATCGGACTACC
ATGCTGATCGAGCTAAATCGATCGGACTACT
ATGCTGATCGAGCTATATCGATCGGACTACC
ATGCTGATCGAGCTATATCGATCGGACTACT
ATGCTGATCGAGCTAGATCGATCGGACTACC
ATGCTGATCGAGCTAGATCGATCGGACTACT
ATGCTGATCGAGCTACATCGATCGGACTACC
ATGCTGATCGAGCTACATCGATCGGACTACT
Like Pierre said, this is a combinatorial problem that can quickly blow up in your face. However, that aside, it easy to generate all the combinations using a recursion. Here is a compact implementation in Perl:
#!/usr/bin/perl -w
# Lookup table of degenerate IUPAC nucleotide codes.
my %deg2nuc = (
"R" => ["A", "G"],
"Y" => ["C", "T"],
"S" => ["G", "C"],
"W" => ["A", "T"],
"K" => ["G", "T"],
"M" => ["A", "C"],
"B" => ["C", "G", "T"],
"D" => ["A", "G", "T"],
"H" => ["A", "C", "T"],
"V" => ["A", "C", "G"],
"N" => ["A", "C", "G", "T"]
);
# Recursive function that replaces degenerate nucleotides with all combinations.
sub generate
{
if ($_[0] =~ /(.*)([RYSWKBDHVN])(.*)/) {
my $head = $1;
my $tail = $3;
my @seqs;
foreach my $nuc (@{$deg2nuc{$2}}) {
push @seqs, generate($head.$nuc.$tail);
}
return @seqs;
}
else {
return $_[0];
}
}
# Demo: print all sequences generated from ANCRG.
print join("\n", generate("ANCRG")), "\n";
EDIT:
On second thought, that was not at all compact by Perl standards. Here is the really compact version that ventures into code golf territory:
#!/usr/bin/perl -w
my %A = ("R"=>1, "W"=>1, "M"=>1, "D"=>1, "H"=>1, "V"=>1, "N"=>1);
my %C = ("Y"=>1, "S"=>1, "M"=>1, "B"=>1, "H"=>1, "V"=>1, "N"=>1);
my %G = ("R"=>1, "S"=>1, "K"=>1, "B"=>1, "D"=>1, "V"=>1, "N"=>1);
my %T = ("Y"=>1, "W"=>1, "K"=>1, "B"=>1, "D"=>1, "H"=>1, "N"=>1);
$_ = "ANCRG\n";
while (s/(.*)([RYSWKBDHVN])(.*)\n/(exists $A{$2} ? "$1A$3\n" : "").(exists $C{$2} ? "$1C$3\n" : "").(exists $G{$2} ? "$1G$3\n" : "").(exists $T{$2} ? "$1T$3\n" : "")/e) {}
print;
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
can you give use a typical example ? because the simple sequence ATGCNNNNATGC would generate 256 oligos....