Calculating Shannon entropy for RNA sequences in multi-fasta using rnaentropy
2
2
Entering edit mode
5.1 years ago

I have to calculate the Shannon entropy for a given list of sequences in a fasta file. Recently I came across a program called rnaentropy which solves my problem but the issue is I cannot run sequences in batch. In the documentation the input is given as a sequence of strings. The parameters in the documentation is as follows:

./RNAentropy "sequence" -s sequence -t temperature -e energyModel(default 2004) -d delta_Temp (compute structural entropy using  <E> = RT² * d/dT ln(Z(T)) ) -c (centered) -z energy_is_zero [0|1] (default 0)  -v (verbose, extended output)
       ./RNAentropy -h (detailed help)  

Input paramters are:
   <sequence>          :  If FIRST argument is a valid nucleotide sequence it will be used input
   -s  <sequence>      : Alternative flag for input sequence
   -t  <temperature>   : Temperature in ºC (default is 17ºC)
   -e  <energy_model>  : Thermodynamic energy model used. Valid values for energy model are 1999, 2004 and 2007 (resp. Turner'99, Turner'04 and Andronescu '07) (default is 2004)
   -d  <delta_T>       : Temperature variation used for estimating d/dT ln(Z(T))
                        NOTE: If this parameter is provided, H is computed estimating  <E> = RT² * d/dT ln(Z(T))
   -c                 : (Use only in combination of -d) Use the centered version for estimating  <E> = RT² * d/dT ln(Z(T)) 
   -v                 : Output includes method for computing H and the name of the ouput parameters
   -z  <energy_is_zero>: [0|1] If value is 1, energies are set to 0, ouput is structural entropy for the uniform case 

Can anyone please suggest me a way to do it for a multi-fasta? Is there a way to iterate in a shell script?

rna rnaentropy sequence • 2.3k views
ADD COMMENT
3
Entering edit mode

If it accepts only single entry fasta sequences, it would be easiest to split the fasta first, then run a bash loop or GNU parallel over all the split files.

Its not clear from that help whether the tool accepts STDIN as input, I suspect not. If it does, you can split the fasta on the fly potentially without having to create intermediate files.

ADD REPLY
0
Entering edit mode

The program is not taking any file as input. I ran a pseudo code and got this result:

./RNAentropy -s AAGCGCGCGTGACTGCAT
2.562930        0.699157        18      -0.880430

Sorry I couldn't follow how to split fasta as you suggested, which would make the input a series of string for each sequence. Sir, can you please provide any link for better understanding the problem. Thank you.

ADD REPLY
0
Entering edit mode

Does your fasta file have multiline sequences?

It is going to be quite complicated to split, parse and extract sequences from your fasta and then run them through the tool all in one go.

I would suggest splitting as I mentioned (please search the forum, there are no end of ways to do this).

You can then loop over all the files something like so:

for file in *.fasta; do
  ./rnaentropy $(grep -v '>' $file | tr -d '\n')
done
ADD REPLY
7
Entering edit mode
5.1 years ago
ole.tange ★ 4.5k

With GNU Parallel (Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them) you would do:

cat in.fasta |
  parallel --pipe -N1 --recstart '>' --rrs 'read a; echo Doing "$a"; ./RNAentropy -s $(tr -d "\n")'

From GNU Parallel Tutorial: Informational:

--recend str Record end separator for --pipe.
--recstart str Record start separator for --pipe.

ADD COMMENT
0
Entering edit mode

I always forget about recstart!

ADD REPLY
3
Entering edit mode
5.1 years ago
JC 13k

I wouuld use a Perl (or Python script) to parse the multifasta and run your tool, like:

#!/usr/bin/perl

use strict;
use warnings;

$/="\n>"; # read fasta sequences in blocks
while (<>) {
     s/>//g;
     my ($id, @seq) = split (/\n/, $_);
     my $seq = join "", @seq; # rebuild the sequence, one single line
     print "$id\n"; # to indicate which sequence is being analyzed
     system("./RNAentropy -s $seq"); # run the command, add params if needed
}

then you can run as perl analyzeSeq.pl < multifasta_in > output_file

ADD COMMENT
0
Entering edit mode

Thank you so much, Sir. You are a real life saver.

ADD REPLY
0
Entering edit mode

glad to help, good luck with your research

ADD REPLY

Login before adding your answer.

Traffic: 1543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6