Differential expression analysis with limma

Question

Large fold change when analyzing expression profiling from microarray

0

Entering edit mode

18 months ago

Yibin • 0

Hi all

I am trying to analyze a expression profiling dataset from GEO, which was detected by Affymetrix microarray. Its data processing method was described as "The data were analyzed with Microarray Suite GCOS using Affymetrix default analysis settings and global scaling as normalization method. The trimmed mean target intensity of each array was arbitrarily set to 100."

I performed simply log2 transformation and then differential expression analysis, but got very large FC value(beyond 100). It was so weird. And I used the online tool GEO2R to analyze the same dataset and the result looked much more normal. And I looked at its R script(showed below). It seems that transformations are performed only under certain conditions.

I wondered if this was reasonable and usual to do like that. Can I just use the result or script to do my research?

Looking forward to your reply. Thank you~

# log2 transformation

ex <- exprs(gset)
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)) 

LogC <- (qx[5] > 100) || (qx[6]-qx[1] > 50 && qx[2] > 0)
if (LogC) { ex[which(ex <= 0)] <- NaN

exprs(gset) <- log2(ex) }

limma DEG RNA • 745 views

ADD COMMENT • link 18 months ago by Yibin • 0

0

Entering edit mode

You need to share full relevant code to diagnose the issue. This code snippet here does thy to figure out whether the data Geo2R is using are already on log scale or not. If the values are high then it applies log transformation.

ADD REPLY • link 18 months ago by ATpoint 87k

0

Entering edit mode

Thank you for your reply. I just used the online tool in the GEO database. I understand it applied transformation when the value too high. But I am just curious about whether it is make sense if I use this result in my assay

Here is the full R script

# Version info: R 4.2.2, Biobase 2.58.0, GEOquery 2.66.0, limma 3.54.0

Differential expression analysis with limma

library(GEOquery) library(limma) library(umap)

load series and platform data from GEO

gset <- getGEO("GSE26051", GSEMatrix =TRUE, AnnotGPL=TRUE) if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1 gset <- gset[[idx]]

make proper column names to match toptable

fvarLabels(gset) <- make.names(fvarLabels(gset))

group membership for all samples

gsms <- "111111XXXXXXXXXXXXXXXXX0000000XXXXXXXXXXXXXXXX" sml <- strsplit(gsms, split="")[[1]]

filter out excluded samples (marked as "X")

sel <- which(sml != "X") sml <- sml[sel] gset <- gset[ ,sel]

log2 transformation

ex <- exprs(gset) qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T)) LogC <- (qx[5] > 100) || (qx[6]-qx[1] > 50 && qx[2] > 0) if (LogC) { ex[which(ex <= 0)] <- NaN exprs(gset) <- log2(ex) }

assign samples to groups and set up design matrix

gs <- factor(sml) groups <- make.names(c("lesion","non-lesion")) levels(gs) <- groups gset$group <- gs design <- model.matrix(~group + 0, gset) colnames(design) <- levels(gs)

gset <- gset[complete.cases(exprs(gset)), ] # skip missing values

fit <- lmFit(gset, design) # fit linear model

set up contrasts of interest and recalculate model coefficients

cts <- paste(groups[1], groups[2], sep="-") cont.matrix <- makeContrasts(contrasts=cts, levels=design) fit2 <- contrasts.fit(fit, cont.matrix)

compute statistics and table of top significant genes

fit2 <- eBayes(fit2, 0.01) tT <- topTable(fit2, adjust="fdr", sort.by="B", number=250)

tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","B","logFC","GB_ACC","SPOT_ID","Gene.Symbol","Gene.symbol","Gene.title")) write.table(tT, file=stdout(), row.names=F, sep="\t")

ADD REPLY • link 18 months ago by Yibin • 0