I’m runing Cuffdiff 2.2.1 with 6 bam files generated from Tophat (each fastq has ~100 millions reads)--- 2 conditions and each has 3 replicates. I use GENCODE GTF and genome.
At the “Inspecting maps and determining fragment length distributions” step, it took 8 hours but still hasn’t finished this single step yet. It looks like it hangs there. But the memory usage is 2.7%. And although I specify “-p 28”, there are only 2 cores running at this step.
Previously, I used Cuffdiff 2.1.1 (with the same GTF and genome) and it was able to pass the “Inspecting maps and determining fragment length distributions” step in 13 minutes. Although my current datasets are 7x larger than the previous one, it still doesn’t look reasonable to run 8+ hours and still not finish this single step.
I looked on the internet, and some suggested reverting to Cuffdiff 2.1.1. Does Cuffdiff 2.2.1 have some known problems on this? Should I go back to Cuffdiff 2.1.1?
Is --max-bundle-frags value used in the “Inspecting maps and determining fragment length distributions” step? I’m currently using the default value. Would reducing --max-bundle-frags help?
If I run Cuffquant on each bam file first instead of running Cuffdiff directly, would this “Inspecting maps and determining fragment length distributions” step be included in each individual Cuffquant run? OR would this step still be performed when running Cuffdiff after Cuffquant?
Any ideas and advice would be greatly appreciated.
Thank you very much!
Two observations that might help. 1) Even if we configure -p for a high number of cores, I have seldom seen it reflected in the cores usage. 2) Please give it a try without giving the --max-bundle-frags option at all. Jf