Dear colleagues I had raw data of GEO dataset. I downloaded cel files It is from Affymetrix Human Clariom D Assay with no biological replicates. I used RMA for normalization, then to filter the low expressed genes, I drew a histogram of median expression. I followed this article for this filtering method https://f1000research.com/articles/5-1384/v1 My question is my data should have normal distribution because 1) it is after normalization 2) central limit theorem?????, however, it is skewed here
here is the histogram of my data https://ibb.co/0ynx0SF What should I do in this case?
codes are
library(pd.clariom.d.human)
GSE103965_norm <- oligo::rma(GSE103965, target = "core")
#filtering low intensity genes
GSE103965_f <- rowMedians(Biobase::exprs(GSE103965_norm))
dev.off()
hist_res <- hist(GSE103965_f, 100, col = "cornsilk1", freq = FALSE,
main = "Histogram of the median intensities",
border = "antiquewhite4",
xlab = "Median intensities")
emp_mu <- hist_res$breaks[which.max(hist_res$density)]
emp_sd <- BiocGenerics::mad(GSE103965_f)/2
prop_cental <- 0.50
lines(sort(GSE103965_f), prop_cental*dnorm(sort(GSE103965_f),
mean = emp_mu , sd = emp_sd),
col = "grey10", lwd = 4)
The histogram of normalized data was also left-skewed (similar to the median) https://ibb.co/jfV9Z1Q Will that affect the downstream analysis? I tried to use other normalization methods like normalizevsn but still the same, I have another question, I am doing a microarray meta-analysis and I have two studies with biological replicates and the remaining are without any replicates? Can I take the average of replicates and consider them one and do the meta-analysis using metaintegrator package?
I want to thank you for your posts and your answers here in Biostars, it helped me a lot in understanding microarray analysis, it is really helpful.
Ah, I see, it should still be okay. I have seen a histogram like that just this morning. The distribution is going to be array-dependent, i.e., depending on the array design, and also the mode of summarisation. Generally, it will follow that typical 'bell curve'. If you increase the number of breaks, how does it look?
Also check a box-and-whiskers plot (
boxplot(exprs(GSE103965_norm))
).Sure thing. Just staying busy to avoid contemplating on life and the Universe.