Hi everyone!
I have a data set for which the wish is to do unsupervised clustering. The consist of mixed data, and hence I chose to go with gower-distance for a dissimilarity matrix. However, I have som concerns with the data.
I have some data for localization of a specific measurement, and the physicians have measured these in 6 different places with "yes"/"no" outcomes. I am wondering if having one column representing all possible combinations would be the right way to go, or to keep everything as asymmetric binaries for a gower distance matrix? I would later go with PAM analysis on the distance matrix to find potential clusters.
I have tried to merge the different binaries into a new variable as factors e.g. Test, Test2, Test3 into New_variable:
Test Test2 Test3 New_variable
A 0 1 1 011
B 1 1 1 111
C 0 0 0 000
So the question is as to run the analysis using each variable or merge them into a factor? My guess is that it would answer different questions?
Cheers and thanks in advance!