Based on speed and accuracy, it looks like a contender. Comparable to GSNAP in the accuracy stakes whilst being considerably faster.
I'm, wondering if anyone has trialled it yet? I ran a simulated data set through it just to see how it ran and it went both fast and smoothly. I am unsure as to how the read counts are calculated for closely related isoforms (i.e. how it distinguishes between them). I contacted the author about it but got no response.
Has anyone else looked into this? Can you offer any thoughts?
Hi, I'm the RUM developer. I also used our simulated data to benchmark cufflinks and scripture. Cufflinks had a False Positive rate around 99% while scripture had a FP rate around 99.9%. As such I would not use either algorithm, I would look for differential expression on the exon level and then try to drill down on those genes to figure out what is happening. The isoform expression problem is a holy grail, the popular solutions do not work, they in my opinion were just the ones that the authors were willing to hype. I think the problem is solvable but it's not there yet. Scripture is particularly bad because it is based on peak calling and tophat, instead of a decent aligner and junctions. I'll check back here if anybody has any further questions. Sorry if I didn't reply to all the emails, got a bit swamped after the paper came out. A new version should be out soon. Thanks for your interest. -Greg
I have, and it was pretty easy to install and use. The results looked good as well - the mapping rate was high compared to other tools I tried. I don't have an answer to your question about distinguishing between isoforms, though. Given that I did not have a "gold standard" data set to compare with (which is usually the case!), it's hard to say how good the expression values were, but at least they had good correlations to values obtained from TopHat/Cufflinks and CLC Bio.
Hm, saying "no good solution to the transcript isoform problem" is a bit debatable; it is certainly possible to try to address it e. g. using expectation maximization such as those used by e. g. Cufflinks and Avadis, or setting up equation systems such as e. g. rQuant. All of those methods have their drawbacks (e. g. for Cufflinks, that you may get different results from run to run because of the Monte Carlo sampling approach used in the EM calculation) but that doesn't mean you should just give up.
Hm, saying "there is no good solution to the transcript isoform problem" is a bit debatable; it is certainly possible to try to address it e. g. using expectation maximization approaches such as those used by e. g. Cufflinks and Avadis, or trying to solve equation systems such as e. g. rQuant does. All of those methods have their drawbacks (e. g. for Cufflinks you may get different results from run to run because of the Monte Carlo sampling approach used in the EM calculation) but that doesn't mean you should just give up.
Good to know, though, that RUM uses the "exon union" method in the terminology of this review paper: http://www.nature.com/nmeth/journal/v8/n6/abs/nmeth.1613.html. By the way, this paper says that it can be shown that this way of quantification will underestimate the expression of alternatively spliced genes.
RUM require read1 and read2 files have exactly the same file size. So I have to refill trimmed reads with N after quality control with Prinseq, which is a stupid step.
I think it is not necessary. RUM author should consider to remove this restriction.
I would also be interested in reading people's results on their trials