limma - voom with TPM data
1
1
Entering edit mode
2.8 years ago
JACKY ▴ 160

Hi, I'm doing a differential expression analysis to RNA-seq data with limma - voom. Unfortunately, I do not have acceess to the raw counts, just normalized TPM data. I know that all libraries, including DESeq2 and limma, expect raw counts and they don't perform very good when receiving nromalized data. This is my first time using limma ( I usually prefare DESeq2) so please bear with me.

Nontheless, I did the analysis according to the bioconductor guide, but without the CPM or logCPM part cause that wont work with TPM.

Despite all this, my results still make little sense, I'm lost here and cant actually point at what is wrong. The TPM data has 28000 genes, the analysis result says that 26000 of these are differentialy expressed !! which is way too much I guess, this never happened with me before and I suspect it's due to the normallized data. Or am I using the data in a false way?

Here is my code, hope it helps:

## Remove all zero rows and noise genes

TPM <- TPM[rowSums(TPM[])>0,]
thresh <- TPM > 0.5
keep <- rowSums(thresh) >= 2
table(keep)
TPM <- TPM[keep,]


## Design and Contrast

design <- model.matrix(~Response, Metadata)
contrast <- matrix(c(1 ,0), ncol = 1)
dimnames(contrast) <- list(c('Response', 'No Response'), 'Diff')

## Voom - Make RNA-seq follow normal distribution

Voom <- voom(TPM, design, plot = TRUE) 
vfit <- lmFit(Voom, design) 
vfit  <- contrasts.fit(vfit, contrasts = contrast) 
efit <- eBayes(vfit) 

plotSA(efit, main = 'final model: Mean-Variance trend')

    summary(decideTests(efit))

deg <- topTable(efit, coef = 'Diff', p.value = 0.05, adjust.method = 'fdr',
                  number = Inf)

And checkout this volcano plot, it is not showing the down regulated genes ( there are 9581 of those), only upregulated LFC

EnhancedVolcano(deg,
                lab = rownames(deg),
                x = 'logFC',
                y = 'adj.P.Val',
                labSize=4,
                FCcutoff=2 )

enter image description here

r limma • 5.0k views
ADD COMMENT
1
Entering edit mode
2.8 years ago
ATpoint 85k

Yeah, this is complete nonsensish results. voom does not work with TPM. If you only have TPMs then model the logTPMs with limma (no voom), see https://support.bioconductor.org/p/125144/#125156

ADD COMMENT
1
Entering edit mode

They didn't talk much about TPM, however I see that the person who asked the question used voom on normalized data, same mistake as I did.

So what should I do ? maybe something like this ? (without voom)

TPM <- log2(TPM+ 1)
vfit <- lmFit(TPM, design) 
vfit  <- contrasts.fit(vfit, contrasts = contrast) 
efit <- eBayes(vfit) 
ADD REPLY
0
Entering edit mode

Yes, TPM/FPKM is the same in this context.

ADD REPLY
0
Entering edit mode

I tried it, it's still the same. Now the summary tells me there are zero significant downregulated genes , and 28000 significant up regulated genes (almost all dataset), yeah it's still rubbish results.

Maybe I should filter more genes? cause when I use filterByExpr() to filter the TPM, only 6000 genes are left.

ADD REPLY
0
Entering edit mode

I do not have the data at hand, so I cannot tell. That filter works on raw counts as well, so it makes limited sense here. Are these data even properly normalized, can you make an MA-plot to explore? One of the many reasons FPKM/TPM is bad is that it only cares about sequencing depth and not about library composition, see e.g. TMM-Normalization

MA-plots is AveExpr vs logFC. Normalization is probably way off here.

ADD REPLY
0
Entering edit mode

I see.. well this is how I normalized the data:

rpkm <- apply(X = subset(raw),
                MARGIN = 2,
                FUN = function(x) {
                  10^9 * x / geneLengths / sum(as.numeric(x))
                })

TPM <- apply(rpkm, 2, function(x) x / sum(as.numeric(x)) * 10^6)

So yeah I have the raw data I just can't use it for the analysis for some stupid reasons. Anyway, this is an SA Plot, after applying eBayes(vfit, trend = TRUE)

enter image description here

using trend = TRUE or not, same result.

ADD REPLY
0
Entering edit mode

So you do have access to the raw counts?! This is not an MAplot.

ADD REPLY
0
Entering edit mode

I do but I wasn't allowed to use it for some research purposes. I did a little rebellion and now I'm allowed to use it finally. Thank you so much for the help and sorry for asking lots of questions.

ADD REPLY

Login before adding your answer.

Traffic: 1984 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6