I have multi fasta nucleotide file, I am interested to replace the ambigious nucleotides (R, Y, S, K, e.t.c) with a possible combination of nucleotides and save both options. For example, the following nucleotide sequence (sudo example):
>Seq1
ATGGKCRCCGCSGT
Contain ambiguous nucleotides (K, R, S), where K= G or T, R=A or G, and S=G or C
so the Seq1 variant will be like this:
>Seq1_a
ATGGGCACCGCGGT
And Seq1_b variant will be like this:
>Seq1_b
ATGGTCGCCGCCGT
One option is to use sed using the following command:
sed 's/K/G/g; s/R/A/g; s/S/G/g' input.txt > output_1
to generate:
>Seq1_a
ATGGGCACCGCGGT
And again use sed command:
sed 's/K/T/g; s/R/G/g; s/S/C/g' input.txt > output_2
to generate:
>Seq1_b
ATGGTCGCCGCCGT
And then combine output from both files to generate output like this:
>Seq1_a
ATGGGCACCGCGGT
>Seq1_b
ATGGTCGCCGCCGT
There must be a more elegant way to do this, Any help will be highly appreciated.
Note that you have 3 ambiguous nucleotides, and each of them can be 2 ATCG nucleotides, so you'll have 23 = 8 possible sequences, not 2.