Question

Is there a way to get RNAFold output for multifasta in tabular format?

1

Entering edit mode

5.6 years ago

adhirajnath14 ▴ 40

I am trying to calculate the MFE along with the secondary structure for multifasta using RNAFold. The output generated is of the format.

>abc
GGCGGAGGUAGGGAGGCACGCGAUGGUAUUUCAGAGCCUCCCGAAUACAACUCCAGGGUAGGGUGUUGAAAGCGUUGGAGAUGUCUAAAGACACCGCCAG
(((((.....(((((((.(..((.......)).).)))))))........((((((.((............)).)))))).((((....))))))))).. (-35.80)
>lmn
GGGAGGCACGCGAUGGUAUUUCAGAGCCUCCCGAAUACAACUCCAGGGUAGGGUGUUGAAAGCGUUGGAGAUGUCUAAAGACACCGCCAGUACCACCCCA
(((((((.(..((.......)).).))))))).............((((..((((.........((((.(.((((....)))).).)))))))))))).. (-29.30)
>xyz
CGAUGGUAUUUCAGAGCCUCCCGAAUACAACUCCAGGGUAGGGUGUUGAAAGCGUUGGAGAUGUCUAAAGACACCGCCAGUACCACCCCACCCCGGGACA
....(((........)))(((((............((((.(((((.((......((((.(.((((....)))).).)))))).))))).))))))))).. (-28.40)

Is there a way to get the output in tubular format with 1. Identifier, 2. sequence, 3. secondary structure and 4. MFE as columns? I have written regular expression scripts to capture each of the four and paste it in a file but I don't think that's an efficient way of doing it. Is there any other convenient way of doing it?

RNAFold • 2.6k views

ADD COMMENT • link updated 13 months ago by jakobjung • 0 • written 5.6 years ago by adhirajnath14 ▴ 40

0

Entering edit mode

Regular expression capture groups is absolutely a valid way to do it, and probably the least hacky.

Otherwise, you would need to transliterate the \n characters to tabs, but since there are line wrappings, that will be much harder.

ADD REPLY • link 5.6 years ago by Joe 22k

score 3 · Accepted Answer · 2019-09-25

3

Entering edit mode

5.6 years ago

JC 13k

In Perl:

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {
    chomp;
    if (/>/) {
        s/>//;
        print "$_\t"; # just the seq id
    }
    elsif (/\((-\d+\.\d+)\)$/) {
        my $mfe = $1;
        s/ \($mfe\)//;
        print "$_\t$mfe\n"; # fold + MFE
    }
    else {
        print "$_\t"; # the seq
    }
}

Validation:

$ perl fold2tab.pl < fold.fa
abc     GGCGGAGGUAGGGAGGCACGCGAUGGUAUUUCAGAGCCUCCCGAAUACAACUCCAGGGUAGGGUGUUGAAAGCGUUGGAGAUGUCUAAAGACACCGCCAG    (((((.....(((((((.(..((.......)).).)))))))........((((((.((............)).)))))).((((....)))))))))..     -35.80
lmn     GGGAGGCACGCGAUGGUAUUUCAGAGCCUCCCGAAUACAACUCCAGGGUAGGGUGUUGAAAGCGUUGGAGAUGUCUAAAGACACCGCCAGUACCACCCCA    (((((((.(..((.......)).).))))))).............((((..((((.........((((.(.((((....)))).).))))))))))))..     -29.30
xyz     CGAUGGUAUUUCAGAGCCUCCCGAAUACAACUCCAGGGUAGGGUGUUGAAAGCGUUGGAGAUGUCUAAAGACACCGCCAGUACCACCCCACCCCGGGACA    ....(((........)))(((((............((((.(((((.((......((((.(.((((....)))).).)))))).))))).)))))))))..     -28.40

ADD COMMENT • link 5.6 years ago by JC 13k

2

Entering edit mode

One-liner:

$ perl -pe 's/\n/\t/g; s/>//; s/\s+/\t/; s/\(-/-/; s/\)\t$/\n/' < fold.fa
abc     GGCGGAGGUAGGGAGGCACGCGAUGGUAUUUCAGAGCCUCCCGAAUACAACUCCAGGGUAGGGUGUUGAAAGCGUUGGAGAUGUCUAAAGACACCGCCAG    (((((.....(((((((.(..((.......)).).)))))))........((((((.((............)).)))))).((((....)))))))))..    -35.80
lmn     GGGAGGCACGCGAUGGUAUUUCAGAGCCUCCCGAAUACAACUCCAGGGUAGGGUGUUGAAAGCGUUGGAGAUGUCUAAAGACACCGCCAGUACCACCCCA    (((((((.(..((.......)).).))))))).............((((..((((.........((((.(.((((....)))).).))))))))))))..    -29.30
xyz     CGAUGGUAUUUCAGAGCCUCCCGAAUACAACUCCAGGGUAGGGUGUUGAAAGCGUUGGAGAUGUCUAAAGACACCGCCAGUACCACCCCACCCCGGGACA    ....(((........)))(((((............((((.(((((.((......((((.(.((((....)))).).)))))).))))).)))))))))..    -28.40

ADD REPLY • link 5.6 years ago by JC 13k

0

Entering edit mode

Thank you JC!! Above code does not delete the open parenthesis for single-digit MFEs, because a whitespace between ( and - is added by RNAfold: ))))... ( -8.80) .)))). (-22.50)

Therefore I modified your code below:

$ perl -pe 's/\n/\t/g; s/>//; s/\s+/\t/; s/$-|\( -/-/; s/$\t$/\n/' < fold.fa

ADD REPLY • link 13 months ago by jakobjung • 0