Affymatrix microarray data do not have normal distribution after normalization (left skewed)
1
1
Entering edit mode
3.8 years ago

Dear colleagues I had raw data of GEO dataset. I downloaded cel files It is from Affymetrix Human Clariom D Assay with no biological replicates. I used RMA for normalization, then to filter the low expressed genes, I drew a histogram of median expression. I followed this article for this filtering method https://f1000research.com/articles/5-1384/v1 My question is my data should have normal distribution because 1) it is after normalization 2) central limit theorem?????, however, it is skewed here

here is the histogram of my data https://ibb.co/0ynx0SF histogram What should I do in this case?

codes are

library(pd.clariom.d.human)

GSE103965_norm <- oligo::rma(GSE103965, target = "core")
#filtering low intensity genes

GSE103965_f <- rowMedians(Biobase::exprs(GSE103965_norm))
dev.off()
hist_res <- hist(GSE103965_f, 100, col = "cornsilk1", freq = FALSE, 
            main = "Histogram of the median intensities", 
            border = "antiquewhite4",
            xlab = "Median intensities")
emp_mu <- hist_res$breaks[which.max(hist_res$density)]
emp_sd <- BiocGenerics::mad(GSE103965_f)/2
prop_cental <- 0.50

lines(sort(GSE103965_f), prop_cental*dnorm(sort(GSE103965_f),
                 mean = emp_mu , sd = emp_sd),
                 col = "grey10", lwd = 4)
microarray R • 1.0k views
ADD COMMENT
1
Entering edit mode
3.8 years ago

Here, you are plotting a histogram of the median intensities of genes. The reason for doing this is to determine a filter cut-off, as per the section 'Filtering based on intensity' in the F1000 published work to which you link in your question.

If you just try hist(exprs(GSE103965_norm)), you should see a histogram that looks more 'normal', both in the statistical and general sense of the word.

Kevin

ADD COMMENT
0
Entering edit mode

The histogram of normalized data was also left-skewed (similar to the median) histogram https://ibb.co/jfV9Z1Q Will that affect the downstream analysis? I tried to use other normalization methods like normalizevsn but still the same, I have another question, I am doing a microarray meta-analysis and I have two studies with biological replicates and the remaining are without any replicates? Can I take the average of replicates and consider them one and do the meta-analysis using metaintegrator package?

I want to thank you for your posts and your answers here in Biostars, it helped me a lot in understanding microarray analysis, it is really helpful.

ADD REPLY
0
Entering edit mode

Ah, I see, it should still be okay. I have seen a histogram like that just this morning. The distribution is going to be array-dependent, i.e., depending on the array design, and also the mode of summarisation. Generally, it will follow that typical 'bell curve'. If you increase the number of breaks, how does it look?

Also check a box-and-whiskers plot (boxplot(exprs(GSE103965_norm))).

I want to thank you for your posts and your answers here in Biostars, it helped me a lot in understanding microarray analysis, it is really helpful.

Sure thing. Just staying busy to avoid contemplating on life and the Universe.

ADD REPLY

Login before adding your answer.

Traffic: 1615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6