I am aware that PCA done on Gaussian distributed data (such as RNA-Seq) ensures the uncorrelatedness as well as independence of each factor. I am having difficulty understanding what the 'Gaussian distribution refers to' here.
For example in gene expression data. I have a matrix with each row indicating specific gene and each column indicating each individual. If I perform an PCA on this matrix hoping to uncover several expression patterns, what does 'Gaussian distibution' refer to here? Is it the expression of each gene in specific pattern or the expression of single gene among individual?
The reason I am asking is to compare the rational of PCA and ICA on separating expression pattern from RNA-Seq data. ICA extracts independent factors while PCA extract linearly uncorrelated factors. If transformed RNA-Seq data can approximate Gaussian distribution (only with greater variance (negative binomial)), PCA done on such approximated Gaussian distributed data uncovers uncorrelated as well as independent factors.
PCA necessarily finds independent factors, that's literally how it works. Those may or may not be related to anything biologically relevant, of course. ICA is a bit harder to compute and generally of lower utility except when you KNOW that the resulting components have a biological interpretation. As an example, I've used ICA to look at samples that were mixtures of multiple cell types. I knew how many cell types there were, so this was useful. Normally one uses PCA for generic QC, for which ICA isn't benefiting you in any way.