Question

Microarray analysis in limma

1

Entering edit mode

7.0 years ago

Leite ★ 1.3k

Hello everyone!

I'm trying to reanalyze data from this test GSE48080 but using the same lfc = 1.7, I find not even close to what was found in the original paper, I simply find 10 or 0 almost always and in original paper 300+ genes.

I have already tried several methods and the data found are the same, I have tried to change the shape of my target and still find DEGS always below this lfc = 1.7, what am I doing wrong? Is there something I'm forgetting to do?

Please I really need reanalyze this data.

Article method

Labeling of probes with cyanine 3 dye (Cy3), hybridization and washing procedures followed, strictly, the manufacturer's protocol (One Color Quick Amp Labeling Kit, Agilent Technologies). The arrays were scanned using a GenePix 4000B Microarray Scanner (Molecular Devices) using default parameters for Agilent 44K microarrays. Initial data analysis was performed using the Agilent Feature Extraction software (version 9.5). This software places the microarray grids, determines feature intensities and flags outlier pixels. The quality of the microarray data was assessed using the positive controls and RNA spike-ins. The gProcessedSignal (i.e. end result of standard Agilent Feature Extraction normalization and background correction procedures) from each array was loaded into the Partek Genomics Suite (v6.6), normalized between arrays using quantile normalization, and log transformed for further analyses. For subsequent statistical analysis we used the ANOVA implementation of Partek. The ANOVA model was defined by the experimental design and included variations due to volunteer group (sepsis, control), day of sample collection (D0, D7) and survival status (survivor, non-survivor).

Code I'm using:

library(limma)

#Set-up
targets <- readTargets("targets.txt")

num.arrays = length(targets$FileName)

# Only reads in the G values, fake the R values as G as well
maData = read.maimages(targets,names=targets$Cy3, columns = list(G = "gMeanSignal", Gb = "gBGUsed", R = 
"gProcessedSignal", Rb = "gBGMedianSignal"), annotation= c("Row", "Col", "FeatureNum", "ProbeUID", "ControlType", 
"ProbeName", "GeneName", "SystematicName"))

# Do not background subtract
maData.nobg = backgroundCorrect(maData, method="none")

#G values only. Use Quantile Normalization for gProcessedSignal
MAq = normalizeBetweenArrays(maData.nobg, method="Rquantile")

TS <- paste(targets$Patient, targets$Day, sep=".")
TS <- factor(TS, levels=c("SepsisAlive.D0","SepsisAlive.D7","SepsisDead.D0","SepsisDead.D7","Control.Control"))
design <- model.matrix(~0+TS)
colnames(design) <- levels(TS)
fit <- lmFit(MAq, design)

cont.matrix <- makeContrasts(SepsisDead.D0-SepsisDead.D7, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)

nrow(topTable(fit2, coef=1, number=99999, lfc=1.7))
probeset.list <- topTable(fit2, coef=1, number=99999, lfc=1.7)
write.table(probeset.list, "results.txt", sep="\t", quote=FALSE)

Best regards,

Leite

limma r • 5.7k views

ADD COMMENT • link updated 3.9 years ago by Gordon Smyth ★ 7.7k • written 7.0 years ago by Leite ★ 1.3k

2

Entering edit mode

Well, you're analyzing it on a completely different platform than them, so I wouldn't expect the exact same parameters to work. limma is not the Partek Suite, and their methods for finding DEGs are likely not exactly the same.

What I'd recommend is removing your lfc restriction and seeing what the lfc looks like for their DEGs in your analysis. This will allow you to determine if you need to tweak the lfc for your reanalysis if those genes must absolutely be included.

ADD REPLY • link 7.0 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Dear by jared.andrews07,

I agree with you that two different platforms will generate different results, however it is a very big difference in 0 ~ 10 genes that I am finding for 300 found in the paper.

I'll think about what you said.

Thanks,

Leite

ADD REPLY • link 7.0 years ago by Leite ★ 1.3k

2

Entering edit mode

The statistical methodologies that you and the others employed are very different. The original authors used ANOVA, whilst you fitted a least squares linear regression line (via limma) to the data and derived statistics from this. ANOVA has different assumptions than regression. Thus, the results are likely going to be very different.

I disagree with the choice of ANOVA by the original authors. I also would not recommended trying to build a complex model with limma due to the low sample numbers.

ADD REPLY • link 3.9 years ago by Kevin Blighe 88k

1

Entering edit mode

Hey Kevin,

Thank you so much!

Leite

ADD REPLY • link 7.0 years ago by Leite ★ 1.3k

score 1 · Answer 1 · 2021-01-12

1

Entering edit mode

3.9 years ago

Gordon Smyth ★ 7.7k

The original paper used a fold-change cutoff of 1.7, whereas in limma you are using a log2-fold-change cutoff of 1.7. Not the same thing. Setting lfc=1.7 corresponds to a fold-change of 2^1.7 = 3.25, so naturally you will find fewer genes satisfying this much higher threshold.

ADD COMMENT • link 3.9 years ago by Gordon Smyth ★ 7.7k

0

Entering edit mode

Dear Gordon Smyth,

I appreciate your answer.

ADD REPLY • link 3.9 years ago by Leite ★ 1.3k