Question

In PCA on RNA-Seq data, what does Gaussian distribution refer to?

0

Entering edit mode

5.1 years ago

CY ▴ 750

I am aware that PCA done on Gaussian distributed data (such as RNA-Seq) ensures the uncorrelatedness as well as independence of each factor. I am having difficulty understanding what the 'Gaussian distribution refers to' here.

For example in gene expression data. I have a matrix with each row indicating specific gene and each column indicating each individual. If I perform an PCA on this matrix hoping to uncover several expression patterns, what does 'Gaussian distibution' refer to here? Is it the expression of each gene in specific pattern or the expression of single gene among individual?

RNA-Seq PCA Gaussian • 1.3k views

ADD COMMENT • link updated 5.1 years ago by Devon Ryan 105k • written 5.1 years ago by CY ▴ 750

score 0 · Answer 1 · 2020-01-21

0

Entering edit mode

5.1 years ago

Devon Ryan 105k

The data does not need to be Gaussian and RNA-seq data is not Gaussian unless you want to transform it. PCA has more utility (in that the results look nicer) when you can transform the data such that there's Gaussian variance across genes, but that isn't a requirement.

ADD COMMENT • link 5.1 years ago by Devon Ryan 105k

0

Entering edit mode

The reason I am asking is to compare the rational of PCA and ICA on separating expression pattern from RNA-Seq data. ICA extracts independent factors while PCA extract linearly uncorrelated factors. If transformed RNA-Seq data can approximate Gaussian distribution (only with greater variance (negative binomial)), PCA done on such approximated Gaussian distributed data uncovers uncorrelated as well as independent factors.

ADD REPLY • link 5.1 years ago by CY ▴ 750

0

Entering edit mode

PCA necessarily finds independent factors, that's literally how it works. Those may or may not be related to anything biologically relevant, of course. ICA is a bit harder to compute and generally of lower utility except when you KNOW that the resulting components have a biological interpretation. As an example, I've used ICA to look at samples that were mixtures of multiple cell types. I knew how many cell types there were, so this was useful. Normally one uses PCA for generic QC, for which ICA isn't benefiting you in any way.

ADD REPLY • link 5.1 years ago by Devon Ryan 105k