Hi,
I did some similarity network fusion with mRNA and miRNA, and I'm generating a variety of potential clustering options which consist of between 2-5 possible members. I'm interested in testing for membership similarity between multiple categorical variables, in particular those memberships which predict the same number of optimal clusters.
For instance, let's say that two categorical variables with 3 levels have the following membership:
Group1
1
1
2
2
3
3
Group2
2
2
1
1
3
3
I want to test how consistently samples group together across these variables. The name of (1,2,3) is irrelevant and strictly qualitative. In this instance, it would be a perfect match because the 2 matches bidirectionally to 1.
Is there a statistical test that I can apply to test this? I had read that chi square might be appropriate, but I'm still a little fussy on how to interpret it in my application, since I don't think it accounts for the semantic equivalences between 1 and 2 in the different groups.
Any suggestions?
Well? Anyone have any suggestions? I mean come now, this isn't stack exchange guys.
The simplistic thing to do is to use a stacked bar plot of your data, and see the grouped distribution. You should code your samples to avoid semantic issues. I don't think any statistic will 'help' you in this matter. At this point your data seems purely based on frequency in a small amount of groups as well as among a small amount of samples...
Sounds reasonable. If I could recode them properly, I could even do a contingency table.