Differentially expressed genes in Microarray data by Wilcoxon-test
3
0
Entering edit mode
10.0 years ago

Hi,

I have two datasets of Microarray from the Affymetrix platform, and I have normalized them using gcrma package.

One dataset has 27 samples and another has 66 samples, and I want to identify the differentially expressed genes between two dataset by running the wilcox.test in R, because I have not any idea about the distribution of gene expressions I can not use the t.test. After running wilcox.test, the p-values for most of the genes are less than 0.001!

How can I identify genes with the significant differential expression between these datasets?

Thanks in advance.

R • 4.7k views
ADD COMMENT
0
Entering edit mode
10.0 years ago

Questions:

Did you normalize your datasets separately and then you combine the expression? Since the datasets were not normalized together, their expression values could vary a lot.

Is the platform the same for the two datasets? (species, platform version)

ADD COMMENT
0
Entering edit mode

The platform of both datasets is similar, but I have normalized each dataset separately after reading cell files by the affy package, I have normalized each one by the gcrma package. Is there any problem?

ADD REPLY
0
Entering edit mode

You should extract RAW data separately (R package Affy or Affymetrix Power Tools), combine your unadjusted data and then normalize with gcrma. You are adjusting for which parameters?

ADD REPLY
0
Entering edit mode

I have executed the following commands on each dataset separately (cels including the names of .CEL files related to each dataset):

raw.data=ReadAffy(verbose=TRUE, filenames=cels, cdfname="HGU133A_HS_ENTREZG")

data.gcrma.norm=gcrma(raw.data)

Is it correct to read all cel files of both samples in one ReadAffy command and then normalize them by gcrma?

ADD REPLY
0
Entering edit mode

Hi Maxime, I have a similar challenge as above. I am trying to run a DEG analysis for two different datasets. I want to normalize my datasets TOGETHER and then combine the expression. However, the platforms are different for the two datasets; one is a hybrid of GPL96(HG-U133A) and GPL97(HG-U133B) while the second dataset's platform is GPL10558. please how do I combine the two datasets to be able to normalize them accordingly before running the DEG analysis?

ADD REPLY
0
Entering edit mode
10.0 years ago

Is it correct to read all cel files of both samples in one ReadAffy command and then normalize them by gcrma?

You can try that. If this is not working, do you normalization separately and then adjust for the cohort:

adjusted <- as.data.frame(matrix(NA, nrow = nrow(data), ncol = ncol(data)))
colnames(adjusted) <- colnames(data)
residus[,1] <- data[,1]    #Cohort

library(MASS)

for (i in 2:ncol(data)) {
    res <-    residuals(rlm(as.numeric(as.matrix(data[,i])) ~ as.numeric(as.matrix(data$Cohort)), method="MM", na.action=na.exclude))
    residus[,i] <- res
}
ADD COMMENT
0
Entering edit mode
9.9 years ago
TriS ★ 4.7k

I have not any idea about the distribution of gene expressions I can not use the t.test

If you know you cannot use T.test then your data are not normally distributed (i.e. parametric tests are not appropriate), that's why you run a non-parametric test like wilcoxon.test.

After you obtain the p-values you should correct for multiple testing = p.value correction. I'd suggest you to try FDR correction (Benjamini Hochberg method). Bonferroni is more stringent and you might end up with fewer genes.

Another approach would be, after using affy to normalize, to use limma() package to analyze the data

ADD COMMENT

Login before adding your answer.

Traffic: 2711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6