In data I have, the way the samples clustered according to a specific PC is interesting biologically. Samples to which high amounts of a compound were added are on one side of the PC, those with low amounts are on the other side, and those with a moderate amount are located in the middle.
I am interested in seeing what are the genes that are responsible for this, and more specifically - what biological functions are enriched in those genes.
So, I thought of running gene set enrichment analysis on the loadings of the genes in that specific PC. Genes that contribute strongly to the PC will have a large positive/negative loading.
My question is whether the genes must be standardized prior to the PCA? Usually, prior to PCA I use DESeq2's rlog function but do not standardize (convert the genes to Z-scores). The effect is more pronounced when not standardizing the data. ( prcomp(scale.=F) )