Question

Converting back a consensus sequence

2

Entering edit mode

9.9 years ago

Lars ★ 1.1k

Is there a tool that gives me back all possible sequences of a consensus sequence?

A simple example:

Input:

ACTCAYT

Output:

ACTCACT
ACTCATT

consensus converter • 3.2k views

ADD COMMENT • link updated 9.9 years ago by thackl ★ 3.0k • written 9.9 years ago by Lars ★ 1.1k

0

Entering edit mode

No tool that I am aware of, but a simple Perl or Python script could do this for you.

edit: however, how to phase when you have more than one ambiguous base? There are 2^n possibilities, e.g.:

ACWCAYT
There are four possibilities:

ACACACT

ACTCATT

ACACATT

ACTCACT

Which ones to choose?

ADD REPLY • link 9.9 years ago by h.mon 35k

1

Entering edit mode

Well, I would like to create a fasta file with all possible sequences. I do not want to choose one, but I need all of them. And a script that can create all seems not easy to me.

ADD REPLY • link 9.9 years ago by Lars ★ 1.1k

1

Entering edit mode

What is the number of sequences, length of sequences and the number of ambiguous bases per sequence? File sizes and number of sequences will grow very quick and easily explode on your face.

ADD REPLY • link 9.9 years ago by h.mon 35k

score 2 · Answer 1 · 2015-06-18

2

Entering edit mode

9.9 years ago

thackl ★ 3.0k

deiupac - A quick and dirty script, but it will do what you want. Although, I have to agree with h.mon, I don't really see, why you would want to recreate every possible options

# it requires
git clone https://github.com/BioInf-Wuerzburg/perl5lib-Fasta.git
# just clone it and make it available, e.g. by putting it into PERL5LIB
export PERL5LIB=/path/to/perl5lib-Fasta/lib:$PERL5LIB; 

>s1
CCTGAGGTCC
>s2
CCrGAGGTCC
>s3
CCrGAGbTCC

# converted back to

>s1.1
CCTGAGGTCC
>s2.1
CCaGAGGTCC
>s2.2
CCgGAGGTCC
>s3.1
CCaGAGcTCC
>s3.2
CCaGAGgTCC
>s3.3
CCaGAGtTCC
>s3.4
CCgGAGcTCC
>s3.5
CCgGAGgTCC
>s3.6
CCgGAGtTCC

ADD COMMENT • link 9.9 years ago by thackl ★ 3.0k

1

Entering edit mode

I have a sample containing three amplified genes. For each two different degenerated primers (fwd and rev) were used. Now I want to use blast to separate the reads of the three different genes.

The sequences of the degenerated primers is in consensus sequence and blast does not take consensus sequences.