Hi there,
I'm new to scRNA-seq(use the seurat pipeline to analysis) and nmf.
Recently, I'm going to do nmf in the scRNA-seq to find the diferent programs(like markers for some cells).
But I don't know which matrix should me use to do nmf, normalized counts or scaled counts?
And how to choose the factorization rank in nmf?
Does anyone have experiences? Thanks for your help!
Thanks for your advice!!! scaled mean the standardization step. About the first problem, I just try use the scaled counts, change the negative number to 0, and seems to get the result we want. I'm not sure maybe normalized counts and scaled counts both can as the input of NMF. Maybe the information in scaled counts have been changed after I change the negative number to 0? If you have more comments, welcome to communicate.
If there are only a few negative values, you could consider them outliers and set them to 0. However, for standardized data, one would expect a significant number of values to be negative and to be meaningful. Discarding them or setting them to 0 means you're considering that values below the mean (i.e. 0) are bad/wrong and setting them to 0, you're artificially increasing the expression level. If doing so produces meaningful results, it suggests there's a lot of noise in the low expression levels. In this case, putting a threshold on the normalized counts should have the same effect.
very helpful, thanks a lot.