Question

Question about datareduction for gower distancing and unsupervised clustering

0

Entering edit mode

3.2 years ago

Nicolai Skovbjerg Arildsen • 0

Hi everyone!

I have a data set for which the wish is to do unsupervised clustering. The consist of mixed data, and hence I chose to go with gower-distance for a dissimilarity matrix. However, I have som concerns with the data.

I have some data for localization of a specific measurement, and the physicians have measured these in 6 different places with "yes"/"no" outcomes. I am wondering if having one column representing all possible combinations would be the right way to go, or to keep everything as asymmetric binaries for a gower distance matrix? I would later go with PAM analysis on the distance matrix to find potential clusters.

I have tried to merge the different binaries into a new variable as factors e.g. Test, Test2, Test3 into New_variable:

    Test   Test2  Test3  New_variable
 A  0      1      1      011
 B  1      1      1      111
 C  0      0      0      000

So the question is as to run the analysis using each variable or merge them into a factor? My guess is that it would answer different questions?

Cheers and thanks in advance!

Partitioning clustering Medoids distance R unsupervised reduction Around data Gower • 467 views

ADD COMMENT • link 3.2 years ago by Nicolai Skovbjerg Arildsen • 0