We are having a speed problem when running Cuffdiff with --frag-bias-correct option.
We use GENCODE gtf and genome for human. The speed bottleneck is at the “Testing for differential expression and regulation in locus” step. With --frag-bias-correct option (-b option), the “Testing for differential expression and regulation in locus” step progresses only 2-3% each day when running with 28 cores. It’s not stuck, but just progressing really slowly. And I already reduced –-max-bundle-frags to 500,000. So far 4+ days have passed, but this step only progressed 12% (so I estimate it'd take 33+ days to finish this step). I checked the memory usage, and there is plenty of memory (only using 4.7% memory).
So according to the suggestions, I removed --frag-bias-correct option (-b option) to run Cuffdiff. Then Cuffdiff completed its run just within one day! I got the gene_exp.diff file with a list of genes marked as significant.
However, since the manual says this option will “run our bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates”, I am wondering how important this bias correction is? When running Cuffdiff without this option, how much (& what kind of) accuracy impact may it have on our gene_exp.diff results?
I'm also wondering how the bias correction algorithm makes the “Testing for differential expression and regulation in locus” step so slow?
I’d greatly appreciate your advice.
Thank you very much in advance!