Question

Kallisto: to scaledTPM or not to scaledTPM

0

Entering edit mode

10 weeks ago

IM • 0

Dear Community members,

I found several informative posts on the difference between scaled TPM, lengthScaledTPM, and no scaling but I am unsure of how the options apply for Kallisto considering that the output from Kallisto is in TPM. So far, I have been using scaled TPM for my analysis in EdgeR. The following is the way I import my counts into EdgeR:

data_<- tximport(files, type = "kallisto", tx2gene=tx2gene, txOut=F, ignoreAfterBar= T, countsFromAbundance="scaledTPM")

Any recommendations on the correct method (for differential analysis) and explanation would be greatly appreciated. Thank you

tximport kallisto EdgeR RNAseq • 1.0k views

ADD COMMENT • link updated 9 weeks ago by Gordon Smyth ★ 7.7k • written 10 weeks ago by IM • 0

score 1 · Answer 1 · 2024-09-10

1

Entering edit mode

10 weeks ago

Gordon Smyth ★ 7.7k

Are you wanting to do a gene level or transcript level analysis in edgeR? To input transcript-level kallisto output into edgeR, simply use:

library(edgeR)
y <- catchKallisto(paths)

where paths is a character vector specifying the directories containing the kallisto output. See

Baldoni et al (2024). Dividing out quantification uncertainty allows efficient assessment of differential transcript expression with edgeR. Nucleic Acids Research 52(3), e13. https://doi.org/10.1093/nar/gkad1167

or Sections 2.18 and 4.6 of the edgeR User's Guide: https://doi.org/doi:10.18129/B9.bioc.edgeR

ADD COMMENT • link 10 weeks ago by Gordon Smyth ★ 7.7k

0

Entering edit mode

Hello Gordon Smyth,

Thank you for your reply. I am doing a gene-level analysis in edgeR

ADD REPLY • link 10 weeks ago by IM • 0

0

Entering edit mode

I think complete instructions are in tximport vignette: https://bioconductor.org/packages/devel/bioc/vignettes/tximport/inst/doc/tximport.html

ADD REPLY • link 10 weeks ago by Gordon Smyth ★ 7.7k

0

Entering edit mode

Hi professor,

Does that mean catchKallisto cannot be used for gene-level analyses?

ADD REPLY • link 10 weeks ago by dsull ★ 6.9k

1

Entering edit mode

I only intended catchKallisto for transcript-level analyses, although it is easy enough to convert it to gene-level by aggregating the expected counts over genes, and we did that in fact to obtain gene-level overdispersions for the Baldoni et al (2024) paper.

For gene level analyses, I personally prefer not to rely on transcript annotation at all because my own (unpublished) research shows that transcript quantification tools are highly sensitive to incomplete or inaccurate annotation. I have plans to publish something on this but it's not at the top of my queue yet.

ADD REPLY • link 10 weeks ago by Gordon Smyth ★ 7.7k

0

Entering edit mode

That makes sense. Thanks!

ADD REPLY • link 10 weeks ago by dsull ★ 6.9k

0

Entering edit mode

Thank you for your reply, Gordon. A follow-up question: since lengthScaledTPM is calculated by multiplying TPM with feature length and scaled up by library size, the TMM normalization downstream is not required, right?

ADD REPLY • link 10 weeks ago by IM • 0

1

Entering edit mode

If you use tximport to read kallisto output into edgeR, then you must follow the instructions in the tximport vignette. The appropriate section of the vignette shows you explicitly how to combine edgeR TMM normalization factors with the tximport normalization.

ADD REPLY • link 9 weeks ago by Gordon Smyth ★ 7.7k