I have a typical clustering problem with a twist. Imagine a standard data matrix with samples along one axis, features along the second axis, and numeric values in each cell. This is pretty much a data matrix one would generate from a gene expression profiling experiment. I would like to cluster both samples and features by similarity, except I also want to incorporate a second matrix that reflects the error estimates for each individual value in my data matrix. Basically my data matrix has high confidence measurements and low confidence measurements, and I want them to be weighted appropriately in the clustering.
I can imagine an algorithm that weights the calculation of the distance metric by the combined error values, and I can also imagine an approach to pool error estimates when combining nodes. My question is whether software to do this calculation already exists. Leads appreciated...
Thank you, yes, helpful. Though admittedly I was looking for a lighter-weight solution that would essentially be a hack on simple hierarchical clustering. But I can see how a GMM EM solution would be a more elegant way of handling errors...