The length of a processed transcript is just the sum of the lengths of
its exons. This should not be confounded with the length of the
stretch of DNA transcribed into RNA (a.k.a. transcription unit), which
can be obtained with width(transcripts(txdb)).
When I apply that method I get duplicates of the genes ID due to information about different transcripts. How should I solve that issue?
Error in h(simpleError(msg, call)) : error in evaluating the
argument 'x' in selecting a method for function 'as.data.frame': error
in evaluating the argument 'x' in selecting a method for function
'width': argument ".f" is missing, with no default
Standard FPKM, RPKM, TPM have the problem that they do not account for any compositional bias but only for sequencign depth (and gene length). I personally prefer more sophisticated methods that actually correct for composition, e.g. cpm from edgeR:
library(edgeR)
cts <- sapply(seq(1,4), function(x) rnorm(10000,100,1))
y <- DGEList(counts = cts)
y <- calcNormFactors(y, method = "TMM") # ?calcNormFactors for other methods
edgeR.cpm <- cpm(y, log = FALSE)
If I want to correct for gene length then I divide edgeR.cpm by the gene length in kb or use edgeR::rpkm() which does pretty much the same as cpm() so correcting for sequencing depth and composition plus divides by gene length which you have to provide.
May I ask upfront for what you plan to use the TPM which was actually developed to compare transcript expression within the same sample?
Sure, some deconvolution methods require a non-log based transformed data. In addition they suggest TPM for that.
Best
Please have a look at this post
updating comment error resolved after starting R new session
Hi,
Thank you for the prompt reply, it seems that this could solve the issue. The thing is I'm getting an error in this line:
Did you encounter this ?
Best
The raw counts correspond to gene expression estimates or transcript expression estimates?
Hi h.mon,
The raw counts are reads mapped to the genes and they are integers as they are not normalized.