Question

limma - voom with TPM data

1

Entering edit mode

2.8 years ago

JACKY ▴ 160

Hi, I'm doing a differential expression analysis to RNA-seq data with limma - voom. Unfortunately, I do not have acceess to the raw counts, just normalized TPM data. I know that all libraries, including DESeq2 and limma, expect raw counts and they don't perform very good when receiving nromalized data. This is my first time using limma ( I usually prefare DESeq2) so please bear with me.

Nontheless, I did the analysis according to the bioconductor guide, but without the CPM or logCPM part cause that wont work with TPM.

Despite all this, my results still make little sense, I'm lost here and cant actually point at what is wrong. The TPM data has 28000 genes, the analysis result says that 26000 of these are differentialy expressed !! which is way too much I guess, this never happened with me before and I suspect it's due to the normallized data. Or am I using the data in a false way?

Here is my code, hope it helps:

## Remove all zero rows and noise genes

TPM <- TPM[rowSums(TPM[])>0,]
thresh <- TPM > 0.5
keep <- rowSums(thresh) >= 2
table(keep)
TPM <- TPM[keep,]


## Design and Contrast

design <- model.matrix(~Response, Metadata)
contrast <- matrix(c(1 ,0), ncol = 1)
dimnames(contrast) <- list(c('Response', 'No Response'), 'Diff')

## Voom - Make RNA-seq follow normal distribution

Voom <- voom(TPM, design, plot = TRUE) 
vfit <- lmFit(Voom, design) 
vfit  <- contrasts.fit(vfit, contrasts = contrast) 
efit <- eBayes(vfit) 

plotSA(efit, main = 'final model: Mean-Variance trend')

    summary(decideTests(efit))

deg <- topTable(efit, coef = 'Diff', p.value = 0.05, adjust.method = 'fdr',
                  number = Inf)

And checkout this volcano plot, it is not showing the down regulated genes ( there are 9581 of those), only upregulated LFC

EnhancedVolcano(deg,
                lab = rownames(deg),
                x = 'logFC',
                y = 'adj.P.Val',
                labSize=4,
                FCcutoff=2 )

enter image description here

r limma • 5.0k views

ADD COMMENT • link 2.8 years ago by JACKY ▴ 160

score 1 · Answer 1 · 2022-02-20

1

Entering edit mode

2.8 years ago

ATpoint 85k

Yeah, this is complete nonsensish results. voom does not work with TPM. If you only have TPMs then model the logTPMs with limma (no voom), see https://support.bioconductor.org/p/125144/#125156

ADD COMMENT • link 2.8 years ago by ATpoint 85k

1

Entering edit mode

They didn't talk much about TPM, however I see that the person who asked the question used voom on normalized data, same mistake as I did.

So what should I do ? maybe something like this ? (without voom)

TPM <- log2(TPM+ 1)
vfit <- lmFit(TPM, design) 
vfit  <- contrasts.fit(vfit, contrasts = contrast) 
efit <- eBayes(vfit)

ADD REPLY • link 2.8 years ago by JACKY ▴ 160

0

Entering edit mode

Yes, TPM/FPKM is the same in this context.

ADD REPLY • link 2.8 years ago by ATpoint 85k

0

Entering edit mode

I tried it, it's still the same. Now the summary tells me there are zero significant downregulated genes , and 28000 significant up regulated genes (almost all dataset), yeah it's still rubbish results.

Maybe I should filter more genes? cause when I use filterByExpr() to filter the TPM, only 6000 genes are left.

ADD REPLY • link 2.8 years ago by JACKY ▴ 160

0

Entering edit mode

I do not have the data at hand, so I cannot tell. That filter works on raw counts as well, so it makes limited sense here. Are these data even properly normalized, can you make an MA-plot to explore? One of the many reasons FPKM/TPM is bad is that it only cares about sequencing depth and not about library composition, see e.g. TMM-Normalization

MA-plots is AveExpr vs logFC. Normalization is probably way off here.

ADD REPLY • link 2.8 years ago by ATpoint 85k

0

Entering edit mode

I see.. well this is how I normalized the data:

rpkm <- apply(X = subset(raw),
                MARGIN = 2,
                FUN = function(x) {
                  10^9 * x / geneLengths / sum(as.numeric(x))
                })

TPM <- apply(rpkm, 2, function(x) x / sum(as.numeric(x)) * 10^6)

So yeah I have the raw data I just can't use it for the analysis for some stupid reasons. Anyway, this is an SA Plot, after applying eBayes(vfit, trend = TRUE)

enter image description here

using trend = TRUE or not, same result.

ADD REPLY • link 2.8 years ago by JACKY ▴ 160

0

Entering edit mode

So you do have access to the raw counts?! This is not an MAplot.

ADD REPLY • link 2.8 years ago by ATpoint 85k

0

Entering edit mode

I do but I wasn't allowed to use it for some research purposes. I did a little rebellion and now I'm allowed to use it finally. Thank you so much for the help and sorry for asking lots of questions.

ADD REPLY • link 2.8 years ago by JACKY ▴ 160