Hi everyone,
I have a file with many predicted miRNAs. I need to perform a randomization test to identify which of these miRNAs are highly probable. This test is to randomize each predicted miRNA 1000 times and calculate each randomized sequence's MFE value (this can be easily done by RNAfold). My current problem is how to generate 1000 radomized sequences for each miRNA? I make an example below.
Predicted miRNAs file
>miR1
augcgugaccguaugcuac
>miR2
uuuggugcguagucguacg
............
>miR100
auaugagucguacguacgu
Radomized sequences file
>miR1_1
ugcggaccguaugcuacau
>miR1_2
ugcggaccuugcuacauga
...........
>miR1_1000
ugcggaccguaugcuacua
.............
............
>miR100_1000
auaugagucgacguacu
Could anyone being familar with perl help to solve this problem? For the next steps including calculating MFE etc, I can do them myself. But I believe someone who know well RNAfold and miRNA prediction can produce a pipline for this work. I attached a Nucleic Acid Research reference link here http://nar.oxfordjournals.org/content/37/suppl_1/D111.full . In the method section, when searing RNAFOLD, you can find the authors' method. THANK YOU in advance!
Instead of looking for a script to generate random sequences, may be you can use "off the shelf" tools to generate random sequences from your input sequence. Take a look at biosquid package and especially shuffle. (these are Debian/Ubuntu packages and I am not sure if you can find an alternative for other distro or windows and I've not tried it myself)
Yes; I'd use EMBOSS shuffleseq, which can easily be run from a (Bio)Perl (or other) script if required.