I have seen mutation signature analysis and they are always done using NMF. I am a bit new to this. Why people always choose NMF for such analysis? Is there an alternative for this?
I also found that NMF is usually used for mutation signature and SVD usually for expression signature. Any biological reason behind this?
I indeed read some methods using LASSO to enhance the sparsity although I am not sure about the biology behind the spasity assumption. Besides, I know that SVD is usually used for gene expression signature. Why NMF for mutation signature and SVD for expression signature? Any biological reason for this?
My thinking is: Sparsity can help you interpret the biological meaning for the metagenes, because only a few numbers of coefficients are positive and it helps you better understand the function of that group of genes. You can think of the expression profile or mutation profile as the output of some intrinsic biological processes. Maybe, one metagene is corresponded to one or two pathways, or a subnetwork in PPI network or gene regulation network.
There are other methods except for LASSO to deal with sparsity, e.g. L0-norm, and also exist sparse version for PCA and SVD.
I think NMF is not limited to mutation signature and it can surely well function in gene expression analysis. For example, a classic paper introduced NMF to analyzing gene expression matrix: Metagenes and molecular pattern discovery using matrix factorization (https://www.ncbi.nlm.nih.gov/pubmed/15016911) and also a recently published paper: Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations (https://www.ncbi.nlm.nih.gov/pubmed/29987051).
In addition, the latent components generated by NMF are not required to be orthogonal, and this is different from PCA and SVD. ICA (independent component analysis) is, to some extent, similar to NMF and you can have a look at this.
Thanks for explaining. I had some digging. Some methods explain the use of sparse solution is that most mutagens are highly specific in the type of damage they cause, and therefore the majority of somatic mutational signatures are sparse.
Thanks. It makes sense now.