Hello,
I have been analysing the dataset GSE140829, which is made up of three groups in the diagnostic variable: AD, MCI and Control.
I have done a previous quality analysis with a plotMDS (after normalising the data) to see how each of the groups behave and to see their distribution.
As you can see in the image, the three groups are intermingled.
I am currently learning. My doubts are:
1) How should I interpret this, can I say that there are practically no differences or outliers?
2) Should I apply any corrections when doing the lmfit analysis? For example, arrayWeights()
3) Does the size of the groups influence the plotMDS? There are more than 600 samples.
Thank you very much for your help.
arrayWeights is imo always a good idea with human (or generally large) cohorts. What you can also do is to use something like sva to estimate factors of unwanted variation. Never done this for microarray, but for RNA-seq here is a great read from the DESeq2 author: https://github.com/mikelove/preNivolumabOnNivolumab/blob/main/preNivolumabOnNivolumab.knit.md
Generally, you can be almost certain that human cohorts are confounded by a lot of factors, including age, dietary status, medication, disease status beyond the actual disease/condition you're investigating, so strict univariate analysis needs either a very clear effect, and/or a large n.