I have to calculate the Shannon entropy for a given list of sequences in a fasta file. Recently I came across a program called rnaentropy which solves my problem but the issue is I cannot run sequences in batch. In the documentation the input is given as a sequence of strings. The parameters in the documentation is as follows:
./RNAentropy "sequence" -s sequence -t temperature -e energyModel(default 2004) -d delta_Temp (compute structural entropy using <E> = RT² * d/dT ln(Z(T)) ) -c (centered) -z energy_is_zero [0|1] (default 0) -v (verbose, extended output) ./RNAentropy -h (detailed help) Input paramters are: <sequence> : If FIRST argument is a valid nucleotide sequence it will be used input -s <sequence> : Alternative flag for input sequence -t <temperature> : Temperature in ºC (default is 17ºC) -e <energy_model> : Thermodynamic energy model used. Valid values for energy model are 1999, 2004 and 2007 (resp. Turner'99, Turner'04 and Andronescu '07) (default is 2004) -d <delta_T> : Temperature variation used for estimating d/dT ln(Z(T)) NOTE: If this parameter is provided, H is computed estimating <E> = RT² * d/dT ln(Z(T)) -c : (Use only in combination of -d) Use the centered version for estimating <E> = RT² * d/dT ln(Z(T)) -v : Output includes method for computing H and the name of the ouput parameters -z <energy_is_zero>: [0|1] If value is 1, energies are set to 0, ouput is structural entropy for the uniform case
Can anyone please suggest me a way to do it for a multi-fasta? Is there a way to iterate in a shell script?
If it accepts only single entry fasta sequences, it would be easiest to split the fasta first, then run a bash loop or GNU parallel over all the split files.
Its not clear from that help whether the tool accepts STDIN as input, I suspect not. If it does, you can split the fasta on the fly potentially without having to create intermediate files.
The program is not taking any file as input. I ran a pseudo code and got this result:
Sorry I couldn't follow how to split fasta as you suggested, which would make the input a series of string for each sequence. Sir, can you please provide any link for better understanding the problem. Thank you.
Does your fasta file have multiline sequences?
It is going to be quite complicated to split, parse and extract sequences from your fasta and then run them through the tool all in one go.
I would suggest splitting as I mentioned (please search the forum, there are no end of ways to do this).
You can then loop over all the files something like so: