I'm just curious if anybody can suggest command line programs that are designed to be truly parallelized?
I found a great post on running serial programs in parallel with GNU parallel: Tool: Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them
However, instead of using tricks to run serial programs in a parallel fashion, is there anything out there that can be run in parallel directly?
I'm mainly interested in programs used for chip-seq and rna-seq workflows. For example, I think STAR aligner can handle parallel processes, but I believe common tools like tophat, MACS, and cufflinks are all serial at some point.
Thanks in advance for any input.
I guess you are referring to true threads vs. forks (i.e., separate processes). This would have to be implemented in the code of the program and may require modifications to the underlying code (may also require additional libraries). What do you think this would gain? What I mean is you could use threads from a script or program to run your pipeline but I'm not sure this would be worth the effort vs. using something already available.
Since you mentioned them, both tophat and cufflinks are multithreaded already. In fact, it'd be hard to find a single-threaded aligner or assembler, since no one would use them.
bwa-aln and Tophat are both parallel, but they have single-threaded components that can bottleneck them on many-core systems. For bwa-aln that's the sampe/samse stage; for TopHat, it's some Perl code (IIRC). I could be wrong about TopHat; I'm just basing that on my observation of top.
Tophat does have a single threaded optional step, that's correct. I'm not sure how many people actually use bwa-aln anymore, bwa mem works better in most cases.