I am running cuffdiff on about 20 samples, and I simply started a single job for the whole genome. It is still running (and still producing output, so it isn't "stuck" or anything) 5 days later. I think I could parallelize it by splitting my input GTF file into many small sets of genes, but I am worried that it would produce incorrect FDR calculations and other "aggregate" statistics that I would not easily be able to correct when I merge the output.
Is there an easy way to parallelize cuffidff, or must I wait for my single job to finish?
Are you using the "-p" option to use multiple threads?
Hmm. I am running it on a cluster, and I have not experimented with multi-threading on the cluster yet (instead of one job with N threads, I run N jobs with one thread each). I know my cluster allows multi-threaded jobs, but only up to the number of CPU cores available on a single node. Still, it's better than nothing.
In the end, where you able to get it to run successfully? If so, how long did it take (and how much memory/processors)? I'm currently running 45 individuals on 24 cores--which hasen't been that useful because after the mapping stage cuffdiff seems to revert back to 1 core.
In the end, were you able to run it successfully? If so, how long did it take (and how much memory/processors)? I'm currently running 45 individuals on 24 cores--which hasen't been that useful because after the mapping stage cuffdiff seems to revert back to 1 core. Any thoughts appreciated!
No, my internship ended. I'm now using Cufflinks again on a completely unrelated project, this time on a single 8-core workstation. We'll see how things go this time.