I want to perform non-negative matrix factorization (NMF) clustering using NMF package in R on scaled microarray data, which contains negative values duo to median centering. As a result, the NMF algorithm returns a wrong message. I found some people address this issue by thresholding the data with a small positive value. But this would change the data a lot. I wonder whether this approach will affect the clustering, and whether there is a better way to cluster microarray data using NMF. Thanks!
You're starting with scaled and median normalized microarray data and just NOW worried about how thresholding might change the data?
I transformed the data in order to remove batch effects. Are you suggesting that I should use the non-transformed data for NMF? Thanks.
I don't know what matrix factorization has to do with clustering, there are many clustering methods that may be more or less appropriate. They can be very sensitive, so trying several methods is a good idea, and investigate how the results of each method agree and disagree. I won't advise using completely un-normalized microarray data, especially when you expect batch effect, as that will dominate the clusters. Still, since the raw data is dominated by batch effect you have to be very careful in how it is corrected. I was just making a sarcastic joke that your data has been significantly manipulated before you started to worry that 'small value thresholding' would impact the results. Everything will impact the results, and it's best to have a clear hypothesis at the start so the best methods can be chosen. Assuming NMF is the right way to go might be a problem (but again I'm not familiar with NMF).