Question

NMF clustering on scaled microarray data

0

Entering edit mode

10.4 years ago

coolbeerzh • 0

I want to perform non-negative matrix factorization (NMF) clustering using NMF package in R on scaled microarray data, which contains negative values duo to median centering. As a result, the NMF algorithm returns a wrong message. I found some people address this issue by thresholding the data with a small positive value. But this would change the data a lot. I wonder whether this approach will affect the clustering, and whether there is a better way to cluster microarray data using NMF. Thanks!

R • 5.3k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by coolbeerzh • 0

1

Entering edit mode

You're starting with scaled and median normalized microarray data and just NOW worried about how thresholding might change the data?

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by karl.stamm 4.1k

0

Entering edit mode

I transformed the data in order to remove batch effects. Are you suggesting that I should use the non-transformed data for NMF? Thanks.

ADD REPLY • link 10.4 years ago by coolbeerzh • 0

1

Entering edit mode

I don't know what matrix factorization has to do with clustering, there are many clustering methods that may be more or less appropriate. They can be very sensitive, so trying several methods is a good idea, and investigate how the results of each method agree and disagree. I won't advise using completely un-normalized microarray data, especially when you expect batch effect, as that will dominate the clusters. Still, since the raw data is dominated by batch effect you have to be very careful in how it is corrected. I was just making a sarcastic joke that your data has been significantly manipulated before you started to worry that 'small value thresholding' would impact the results. Everything will impact the results, and it's best to have a clear hypothesis at the start so the best methods can be chosen. Assuming NMF is the right way to go might be a problem (but again I'm not familiar with NMF).

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by karl.stamm 4.1k

Ram · Answer 1 · 2015-04-25

NMF requires non-negative data. By applying NMF to a matrix with negative entries, you implicitly assume that those are in error (i.e. they should actually be positive) so if there are not too many of them, you could deal with them for example by removing the corresponding genes from the analysis. Otherwise, you should consider another preprocessing that doesn't create negative values. Alternatively, you could try the method used in GenePattern: concatenate two versions of your matrix, one with 0 for all negative values and another with 0 for all positive values and with all negative values replaced by their absolute values.