Expand a DNA sequence with IUPAC codes into multiple sequence in R
2
Hi!
I could not come with a code that can expand a sequence like seq= 'ARCS'. What would be the algorithm or (pseudo)code for the expansion in R?
Thanks in advance.
p.s.
Many useful python codes have been generously supplied, but I dont know python!
R
sequence
• 3.7k views
This is super easy with Python, see e.g. here .
Copy pasted one option below:
from Bio import Seq
from itertools import product
def extend_ambiguous_dna(seq):
"""return list of all possible sequences given an ambiguous DNA input"""
d = Seq.IUPAC.IUPACData.ambiguous_dna_values
return [ list(map("".join, product(*map(d.get, seq)))) ]
what a coincidence!
I've the code for this question,also there's a brief description for it if you can understand our Chinese.
http://www.biotrainee.com/thread-668-1-1.html
while(<DATA>){
chomp;
@F=split/:/;
$hash{$F[0]}=uc $F[1];
} ##这里记录简并碱基的对应关系
## %hash stored the tables;
sub primer2multiple{
$primer=$_[0];
$prod=1;
$primer_len=length $primer ;
foreach $i (0..$primer_len-1){
$char=substr($primer,$i,1);
#$prod*=length $hash{$char} if ($char !~/[ATCG]/) ;
if ($char !~/[ATCG]/) {
push @pos_list,$i;
push @char_list,$hash{$char};
##首先找出所有的不是ATCG的碱基位置以及它对应的碱基
## record all of the positions which are not ATCG;
}
}
@out_list=($primer);
##循环处理每个不是ATCG的碱基位置,让它们根据对应关系扩展
foreach my $i (0..scalar(@pos_list)-1){
@out_list=&new_out_list(\@out_list,$pos_list[$i],$char_list[$i]);
} ##&new_out_list 这个函数非常重要,会把数组不停的扩展,最终达到应该有的个数!
print join"\n",@out_list;
print "\n";
}
sub new_out_list{
my @array = @{$_[0]};
my $pos = $_[1];
my $char = $_[2];
my @new_array=();
foreach my $i (@array){
foreach my $j (0..length($char)-1){
substr($i,$pos,1,substr($char,$j,1));
push @new_array,$i;
}
}
return(@new_array);
}
primer2multiple('ATGCVCGCDCTNCCTGAB');
__DATA__
R:ag
Y:CT
M:AC
K:GT
S:gc
W:AT
H:atc
B:gtc
V:gac
D:GAT
N:ATgc
Login before adding your answer.
Traffic: 1344 users visited in the last hour
It's really easy by using python, how do you explain the functions product and seq
All this stuff is well documented, seq & itertools.
great, but both you and I made a mistake, he need the R solution