Dear Community members,
I found several informative posts on the difference between scaled TPM, lengthScaledTPM, and no scaling but I am unsure of how the options apply for Kallisto considering that the output from Kallisto is in TPM. So far, I have been using scaled TPM for my analysis in EdgeR. The following is the way I import my counts into EdgeR:
data_<- tximport(files, type = "kallisto", tx2gene=tx2gene, txOut=F, ignoreAfterBar= T, countsFromAbundance="scaledTPM")
Any recommendations on the correct method (for differential analysis) and explanation would be greatly appreciated. Thank you
Hello Gordon Smyth,
Thank you for your reply. I am doing a gene-level analysis in edgeR
I think complete instructions are in tximport vignette: https://bioconductor.org/packages/devel/bioc/vignettes/tximport/inst/doc/tximport.html
Hi professor,
Does that mean catchKallisto cannot be used for gene-level analyses?
I only intended catchKallisto for transcript-level analyses, although it is easy enough to convert it to gene-level by aggregating the expected counts over genes, and we did that in fact to obtain gene-level overdispersions for the Baldoni et al (2024) paper.
For gene level analyses, I personally prefer not to rely on transcript annotation at all because my own (unpublished) research shows that transcript quantification tools are highly sensitive to incomplete or inaccurate annotation. I have plans to publish something on this but it's not at the top of my queue yet.
That makes sense. Thanks!
Thank you for your reply, Gordon. A follow-up question: since lengthScaledTPM is calculated by multiplying TPM with feature length and scaled up by library size, the TMM normalization downstream is not required, right?
If you use tximport to read kallisto output into edgeR, then you must follow the instructions in the tximport vignette. The appropriate section of the vignette shows you explicitly how to combine edgeR TMM normalization factors with the tximport normalization.