I have logCPM values I'm trying to convert to RPKM. Doing a search, I found you can convert it by doing the following:
RPKM = 2^(logCPM-log2(geneLength))
However, this is giving me negative RPKM values, as most of the gene lengths in log2 form are larger than the logCPM values (unless my math is incorrect.)
Why would you want to go from one reasonably good data distribution (logCPM) to one pretty bad one (RPKM)? RPKM was the first normalisation method for single-end RNA-seq reads but it is not suitable for cross-sample differential expression.
Kevin, I completely agree with you. However, a collaborator I am working with has specifically asked for RPKM values. I do believe they also understand the limits of using it compared to logCPM, ect.
It's not possible for you to receive negative RPKMs, since there exists no number such that 2 raised to it is less than 0. Subtracting the logs is the same as dividing counts by gene length and then taking the log of that. Yes, that can be negative, but you're then reversing the log with 2^.
BTW, the gene length should be in kilobases, in case you didn't already know that.
Thanks so much. I realized the data that was labeled as logCPM wasn't accurate. I went ahead and pulled the raw count tables, and calculated everything manually. Thanks for your help!
Why would you want to go from one reasonably good data distribution (logCPM) to one pretty bad one (RPKM)? RPKM was the first normalisation method for single-end RNA-seq reads but it is not suitable for cross-sample differential expression.
Kevin, I completely agree with you. However, a collaborator I am working with has specifically asked for RPKM values. I do believe they also understand the limits of using it compared to logCPM, ect.
I understand - I've been in those situations. Did you obtain your fomula from here: http://seqanswers.com/forums/showthread.php?t=59202 ?
I wlll 'nudge' Devon to see what he says.
I did! Thanks, I appreciate it. I should also note that these logCPM values are coming directly from edgeR!