Hi I'm working in de novo transcriptome assembling from RNA-seq data. Right now I'm using Rfam cmscan to annotate 58808 contigs I got (N50=800). I'm working in a debian virtual machine in a cluster with 16 cores and 32 Gb ram.
Setting cmscan to use all the cores I have available when annotating the contigs, I see the use of each cpu is rather low, ~32 % each, never 100%. But when I tried splitting the big .fasta file in 16 subfiles, and then running cmscan parallely for each file in a subprocess, every core is used at 100%, and I got the results faster, (the -Z parameter in each thread was set equal to the whole file Mb*2).
However, I am not sure if this alternative is correct (I used the results from the not-splitted-fasta run). ¿What should I do next time with a similar task?.
Thanks in advance for reading!