The commands below are the R scripts that are used to analyze my microarray data. I want to know which lines are responsible for 1-Replacing replicated probes with the mean 2-Background correction
# Differential expression analysis with limma
library(Biobase)
library(GEOquery)
library(limma)
# load series and platform data from GEO
gset <- getGEO("GSE116959", GSEMatrix =TRUE, AnnotGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL17077", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
# make proper column names to match toptable
fvarLabels(gset) <- make.names(fvarLabels(gset))
# group names for all samples
gsms <- "00010000000100000000000000000000011110000010000000000000000011010010"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
# log2 transform
ex <- exprs(gset)
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx[5] > 100) ||
(qx[6]-qx[1] > 50 && qx[2] > 0) ||
(qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
if (LogC) { ex[which(ex <= 0)] <- NaN
exprs(gset) <- log2(ex) }
# set up the data and proceed with analysis
sml <- paste("G", sml, sep="") # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=250)
Thanks for your answer. Can we say whenever we obtain datasets by using getGEO function, the data are background corrected and duplicates are substituted with mean, and even if we want to have our analyze by R (without GEO2R), if we get datasets through this function there is no need to background correction and substitution of duplicates with the mean?
It would not be safe to say that background correction and duplicate substitution have been done. That would be determined by the original authors who submitted the data to GEO. GEO does not require specific data processing methods as a practice. For many common expression platforms background subtraction is likely done; and for some GEO submissions, duplicate processing might have been done. But you would need to read the associated manuscript or the data processing description on the GEO record to determine with certainty if that processing has been done for any specific GEO dataset.