Hi,
I have a gene expression microarray dataset with dimensionality 427 x ~40,000.
I wish to test if this data follows a multivariate normal distibution. Within R in the mvnormtest library the mshapiro.test() function (Shapiro-Wilkes test) only permits vectors no longer than 5000 entries.
I also attempted using the mahalanobis distance squared ( when plotted on a QQ-plot it should generate a Chi-Squared distribution if the distibution of the data is normal). However, this requires the calculation of a covariance matrix which is not feasible for a data set this large (or wide).
Do you guys have any suggestions for alternative tests of multivariate normality for a large dataset preferably but not necessarily with R.
Regards, S ;-)
I doubt that the calculation of SW makes sense for the whole data-set. I will try to explain this in an answer later.