Hello,
I am currently reading about single cell sequencing and am having a hard time understanding something very basic I think.
Everywhere I look, I read that scRNA seq count tables are heteroskedastic (highly expressed genes -> higher variance), which makes analyzing them somewhat challenging. Thus our goal is to transform them somehow, so that the variance is "stable", i.e. does not depend on the mean. A common plot to visualize this is shown below:
My first question is, why do highly expressed genes actually vary more than lowly expressed genes.
Another thing which confuses me, is that if you plot the gene expression of two cells (or patients) you will actually notice that the genes that are lowly expressed vary more and the higher the expression of the genes, the lower the variance gets. For me, this feels like it directly contradicts the heteroskedasticity statement above. I have also included a picture.
I would really appreciate it, if somebody could clear this confusion for me.
Cheers!
Seurat offers residual variance against geometric mean also.