How should effective lengths returned by Salmon be collapsed for Differential Expression?
1
0
Entering edit mode
6.7 years ago
arf1389 ▴ 10

Hi All,

I used Salmon to align a set of technical replicate fasta files to my reference transcriptome with the seq-bias and gc-bias corrections enabled.

I know that the tximport package reports the effective length vector should be processed the following way for use in edgeR:

cts <- txi$counts

normMat <- txi$length

normMat <- normMat/exp(rowMeans(log(normMat)))

library(edgeR)

o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))

y <- DGEList(cts)

y$offset <- t(t(log(normMat)) + o)

#y is now ready for estimate dispersion functions see edgeR User's Guide

What I am unsure of is...

If I collapse my technical replicates by the sum or the mean, how should I collapse the effective length vector returned from tximport? Should I take the mean of the effective lengths? The sum?

RNA-Seq salmon alignment • 2.4k views
ADD COMMENT
1
Entering edit mode
6.7 years ago
arf1389 ▴ 10

The answer is to this question is a feature that was added as of Salmon v 0.9.0.

Added the quantmerge command. This allows producing a multi-sample TSV file with aggregated abundance metrics over samples from many different quantification runs

This can be used to merge technical replicate count estimates and produce a new data set with the merged counts and effective lengths.

ADD COMMENT

Login before adding your answer.

Traffic: 2599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6