How do I run DE for TPM values (not CPM)?
2
0
Entering edit mode
17 months ago
John • 0

Can someone explain to me how to run DE using TPM instead of CPM, please? All the DE guides I'm seeing only use CPM values, but I need to work with TPM values. Providing either a guide of code to use or a manual/tutorial providing one would be most appreciated.

Also, note that I'm working with a .gz file that I downloaded onto RStudio with data.table::fread, which seems like it would affect how I would need to code the DE process in terms of what kinds of objects I'm using.

RStudio differential-expression TPM • 1.9k views
ADD COMMENT
0
Entering edit mode
17 months ago
LChart 4.6k

TPM and CPM are highly similar scales, so if you're following (say) a tutorial for DE with CPMs using LIMMA (which I would recommend if you're stuck with this pre-transformed data), then you can use TPM values wherever you see the CPMs and you should be good to go.

ADD COMMENT
0
Entering edit mode

Okay, so even if the tutorial gives a code like

logCPM <- cpm(dge, log=TRUE, prior.count=3)

It should still be fine to just put in my tpm file where CPM should go? (Note here that they use a cpm function, but there isn't a tpm function, to my knowledge.)

And as for the second part of what I asked, here is what I have so far:

dge <- DGEList(matchedgeneTPM)
dge <- calcNormFactors(dge)
dge$samples
fnames <- colnames(females)
mnames <- colnames(males)
group <- interaction(fnames, mnames)
fnames <- colnames(females)
mnames <- colnames(males)
group <- interaction(fnames, mnames)
plotMDS(dge, col = as.numeric(group))
mm <- model.matrix(~0 + group)
fit <- lmFit(dge, mm)

"matchedgeneTPM" is the table of TPM's I have (I already converted to log2(TPM+1) beforehand) and "females" and "males" are the list of female and male sample ID's I also constructed beforehand. Everything works fine until the last line, which gives the error "Error in getEAWP(object) : data object isn't of a recognized data class". Would you be able to explain how to fix this error, please?

ADD REPLY
0
Entering edit mode

The cpm() function takes in a matrix of counts. So does calcNormFactors.

Just take your TPM, log it, and pass it through limma.

ADD REPLY
0
Entering edit mode

That's what I'm trying to do but again, I'm getting an error on the last line and I'm asking how to fix it.

ADD REPLY
0
Entering edit mode

you're mixing two packages, edgeR ("dgelist") and limma ("lmFit"). Just put your log(1+TPM) matrix as the first argument to lmFit.

ADD REPLY
0
Entering edit mode

Thanks, but it's giving me the error that expression object should be numeric and that there are 2 non-numeric columns (because the first two columns are the labels of the genes and gene ID's I'm working with). Here is my code:

fit <- lmFit(matchedgeneTPM)[,-c(1,2)]

How should I reformat the code? And apologies, I know this should be simple but I'm new to this kind of stuff.

Also, do I need the stuff with group and mm?

ADD REPLY
0
Entering edit mode
17 months ago
Gordon Smyth ★ 7.7k

See "Differential expression analysis starting from TPM data" https://support.bioconductor.org/p/98820/

The short answer is that you can analyse log(TPM+1) data passably well using limma with arrayWeights and trend, but you will nevertheless pay a heavy price in terms of statistical power compared to an analysis with actual read counts and library sizes. The use of arrayWeights() in limma is a way to try to estimate the library sizes that the TPMs were originally computed from.

ADD COMMENT

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6