Entering edit mode
9.9 years ago
Pei
▴
220
Hi:
I am trying to calculating the RPKM from mean per-base read coverage which was provided by the paper Brawand et al, 2011.
but it seems strange to me that my resulting 'RPKM' value is rather small (each sample has a median value less than 1)
Below is my code[2].
rz2rpkm <- function(expr) { readcount <- expr # expr is a matrix of mean per-base read coverage with gene in rows and sample in columns readcount <- readcount[,6:ncol(readcount)] read.length <- 75 exoniclength <- expr[,"ExonicLength"] for(j in colnames(expr)[6:ncol(expr)]) { readcount[,j] <- (expr[,j]*exoniclength)/read.length } M <- apply(readcount,2,sum) RPKM <- readcount for(j in colnames(readcount)) { RPKM[,j] <- (readcount[,j]/(exoniclength*M[j]))*10^9 } return(RPKM) }
Here I sincerely asking for your help for this problem.
I have no experience with RNA-seq data before.
Many many thanks!
Best,
Can you post a few example values that seem off and the exoniclengths used to derive them? RPKM values can be quite low.
BTW, you can get rid of the for loops and increase performance. The first loop is simply:
The second one can be handled in a similar way
.
Hello lin.pei26!
It appears that your post has been cross-posted to another site: SEQanswers.
This is typically not recommended as it runs the risk of annoying people in both communities.
Thank you Devon
Below are the top lines from MOUSE data[1]
I use this matrix as the variable
expr
results looks like[2]
Link to the Brawand data: http://www.nature.com/nature/journal/v478/n7369/extref/nature10532-s5.zip
[1]
[2]
Those values look about right. Most genes aren't well expressed generally, which will skew the RPKM low.
Careful, I think your read counts are not raw counts, or why are they fractional numbers in [1]?
Those are mean per-base coverage (op is multiplying by length to get average read counts), not raw counts. Having said that, the raw data is available, so it'd be easy enough for Pei to just reprocess a single sample and check how correct these calculations are.