Dear all,
I am working on a dataset containing 700 samples and 50000 genes, which has been rank normalized. For my purposes, I calculated the Pearson's residuals of each gene including covariates and technical confounders is the model. The data I produced has not a normal distribution though. Bear in mind that I checked the normality for each gene with Shapiro test, which is very sensitive in case of a big dataset (do you think that 700 observations is a big dataset in this case?) and could detect deviations from normality which do not actually influence the results. Therefore the data might also be fine after all. I was wondering if it advisable to normalize this data again, or if it is not necessary. I searched the internet looking for examples or an explanation on the use of a second normalization step, but I could not find anything useful.
I would really appreciate any answer and comment on this.
Best Wishes
Rank normalization itself is very stringent ,so it should take care of everything. (By Rank Normalization,I am assuming every gene in a sample is forced for a value between 0 and 1).
Hi Ron,
thanks for your comment! Actually the values are not between 0 and 1 but between -3 and +3, I am not sure how the normalization has been done exactly as I got this file as it is...