Question

Kallisto and downstream analysis with tximport and DESeq2

0

Entering edit mode

5.7 years ago

Mozart ▴ 330

Hello there, I am trying to analyse a dataset using kallisto and its abundances generated. thus, I am using tximport and want, then, use TPM counts and when I open the txi.kallisto.tsv files, essentially I have three different columns (i.e. 'abundances', 'counts' and 'length'). I am not sure whether the counts tximport pretend to import are normalized or not (i.e. TPM counts or not)??

thanks in advance

RNA-Seq kallisto tximport deseq2 • 8.9k views

ADD COMMENT • link 4.5 years ago by Mozart ▴ 330

1

Entering edit mode

As an aside, you should not use normalized counts with DESeq2. It expects unnormalized, raw counts.

ADD REPLY • link 5.7 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thanks! So, when I am using tximport, the function DESeqDataSetFromTximport function automatically correct for the length bias? I guess so; in fact, from here link to tximport:

Note: there are two suggested ways of importing estimates for use with differential gene expression (DGE) methods. The first method, which we show below for edgeR and for DESeq2, is to use the gene-level estimated counts from the quantification tools, and additionally to use the transcript-level abundance estimates to calculate a gene-level offset that corrects for changes to the average transcript length across samples. The code examples below accomplish these steps for you, keeping track of appropriate matrices and calculating these offsets. For edgeR you need to assign a matrix to y$offset, but the function DESeqDataSetFromTximport takes care of creation of the offset for you. Let’s call this method “original counts and offset”.

but if someone could confirm this, that would be great.

ADD REPLY • link 5.7 years ago by Mozart ▴ 330

score 2 · Answer 1 · 2019-04-26

For use with DESeq2 just follow the timport manual section for kallisto, but set txOut=F to aggregate transcript abundances to the gene level. The countsFromAbundance="scaledTPM" function from what I understand is only necessary to output a count matrix in case you want to use it for something else rather than DESeq2, so not necessary in this case.

DESeqDataSetFromTximport function automatically correct for the length bias

Yes that is the whole point of this method. It is the length bias due to different transcript/isoform usage between the condition that is of interest here, which will be corrected by passing an offset to DESeq2 for the linear model.