A check over my code using Limma DE, starting from normalized data
0
0
Entering edit mode
2.5 years ago
JACKY ▴ 160

I'm conducting a DE analysis over normalized TPM, RPKM, or FPKM data. Raw counts are not available. The data is about the clinical outcome of a certain drug. I know this is not the ideal way to do this kind of analysis, but I have no choice. This complicates things for me, I'm not getting results and I just want to make sure my code is right. Here is my code, from one of the datasets, that has TPM data:

TPM = TPM[rowSums(TPM[])>0,]
thresh = TPM > 10
keep = rowSums(thresh) >= 12
table(keep)
TPM = TPM[keep,]

# Design and Contrast
 design = model.matrix(~ 0 + outcome, metadata)
 colnames(design)[c(1,2)] = c('NoResponse','Response')
 contrast = makeContrasts(NoResp_vs_Resp = NoResponse - Response, levels = colnames(design))

 log_TPM = scale(log2(TPM + 0.1))
 vfit = lmFit(log_TPM, design) 
 vfit = contrasts.fit(vfit, contrasts = contrast) 
 efit = eBayes(vfit, robust = TRUE, trend = TRUE) 
 plotSA(efit, main = 'final model with TPM ~ Mean-Variance trend')

 summary(decideTests(efit))
 DEG = topTable(efit, coef = 'NoResp_vs_Resp', p.value = 0.05, adjust.method = 'fdr', number = Inf)

I'm trying to get figures that are as close to the Bioconductors guide for limma. Can anybody please just confirm wheather my code is alright, or is there any other way I could do this? regarding filtering and stuff like that, cause I cant use edgeR::filterByExpr. Thanks!

limma r • 1.5k views
ADD COMMENT
0
Entering edit mode

can you run normality test on log_TPM object?

ADD REPLY
0
Entering edit mode

I don't think any data of this type (RNA-seq) is normally distributed.. or is it ?

ADD REPLY
0
Entering edit mode

RNAseq data is mostly modeled after NB distribution, for biological replicates. Since data is normalized (TPM) and subsequently log transformed, it is no more original data. As TPM data is log transformed, my understanding is that values in log_TPM object are going to be normally distributed. However, I was not sure of your data. Hence, I asked you to test for normality of the data. However, as several times mentioned on biostars, raw counts are better in analysing the RNA seq data.

ADD REPLY
0
Entering edit mode

The data is not normally distributed. I tried ggqqplot(log_TPM$sample), on many samples that I chose randomley, and got something like this:

enter image description here

You think you could help me or is it a gone cause ?

ADD REPLY
0
Entering edit mode

The plot you show here is irrelevant for a limma analysis. limma does make a normality assumption, but in a much more subtle sense than you think. limma makes no assumptions at all about qqplots or histograms of expression values.

The SA plot that your plotSA code above makes is far more relevant for checking that everything is ok.

ADD REPLY
0
Entering edit mode

Isn't this duplicated operation?

TPM = TPM[rowSums(TPM[])>0,]
thresh = TPM > 10
keep = rowSums(thresh) >= 12
table(keep)
TPM = TPM[keep,]
ADD REPLY
0
Entering edit mode

Basically you're right. The first line is supposed to delete all zero rows. And the next lines are supposed to delete genes based on the threshold. I do both no reason at all behind it.

ADD REPLY

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6