Hello everyone!
I'm trying to reanalyze data from this test GSE48080 but using the same lfc = 1.7, I find not even close to what was found in the original paper, I simply find 10 or 0 almost always and in original paper 300+ genes.
I have already tried several methods and the data found are the same, I have tried to change the shape of my target and still find DEGS always below this lfc = 1.7, what am I doing wrong? Is there something I'm forgetting to do?
Please I really need reanalyze this data.
Article method
Labeling of probes with cyanine 3 dye (Cy3), hybridization and washing procedures followed, strictly, the manufacturer's protocol (One Color Quick Amp Labeling Kit, Agilent Technologies). The arrays were scanned using a GenePix 4000B Microarray Scanner (Molecular Devices) using default parameters for Agilent 44K microarrays. Initial data analysis was performed using the Agilent Feature Extraction software (version 9.5). This software places the microarray grids, determines feature intensities and flags outlier pixels. The quality of the microarray data was assessed using the positive controls and RNA spike-ins. The gProcessedSignal (i.e. end result of standard Agilent Feature Extraction normalization and background correction procedures) from each array was loaded into the Partek Genomics Suite (v6.6), normalized between arrays using quantile normalization, and log transformed for further analyses. For subsequent statistical analysis we used the ANOVA implementation of Partek. The ANOVA model was defined by the experimental design and included variations due to volunteer group (sepsis, control), day of sample collection (D0, D7) and survival status (survivor, non-survivor).
Code I'm using:
library(limma)
#Set-up
targets <- readTargets("targets.txt")
num.arrays = length(targets$FileName)
# Only reads in the G values, fake the R values as G as well
maData = read.maimages(targets,names=targets$Cy3, columns = list(G = "gMeanSignal", Gb = "gBGUsed", R =
"gProcessedSignal", Rb = "gBGMedianSignal"), annotation= c("Row", "Col", "FeatureNum", "ProbeUID", "ControlType",
"ProbeName", "GeneName", "SystematicName"))
# Do not background subtract
maData.nobg = backgroundCorrect(maData, method="none")
#G values only. Use Quantile Normalization for gProcessedSignal
MAq = normalizeBetweenArrays(maData.nobg, method="Rquantile")
TS <- paste(targets$Patient, targets$Day, sep=".")
TS <- factor(TS, levels=c("SepsisAlive.D0","SepsisAlive.D7","SepsisDead.D0","SepsisDead.D7","Control.Control"))
design <- model.matrix(~0+TS)
colnames(design) <- levels(TS)
fit <- lmFit(MAq, design)
cont.matrix <- makeContrasts(SepsisDead.D0-SepsisDead.D7, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
nrow(topTable(fit2, coef=1, number=99999, lfc=1.7))
probeset.list <- topTable(fit2, coef=1, number=99999, lfc=1.7)
write.table(probeset.list, "results.txt", sep="\t", quote=FALSE)
Best regards,
Leite
Well, you're analyzing it on a completely different platform than them, so I wouldn't expect the exact same parameters to work.
limma
is not the Partek Suite, and their methods for finding DEGs are likely not exactly the same.What I'd recommend is removing your
lfc
restriction and seeing what the lfc looks like for their DEGs in your analysis. This will allow you to determine if you need to tweak the lfc for your reanalysis if those genes must absolutely be included.Dear by jared.andrews07,
I agree with you that two different platforms will generate different results, however it is a very big difference in 0 ~ 10 genes that I am finding for 300 found in the paper.
I'll think about what you said.
Thanks,
Leite
The statistical methodologies that you and the others employed are very different. The original authors used ANOVA, whilst you fitted a least squares linear regression line (via limma) to the data and derived statistics from this. ANOVA has different assumptions than regression. Thus, the results are likely going to be very different.
I disagree with the choice of ANOVA by the original authors. I also would not recommended trying to build a complex model with limma due to the low sample numbers.
Hey Kevin,
Thank you so much!
Leite