Question

resolving specific clusters from data (spectral clustering)

0

Entering edit mode

2.4 years ago

benformatics 4.0k

I'm trying to cluster this genetic data but even with multiple different methods k-means/medoids spectral, etc... I can't seem to resolve the two big clusters. Any suggestions? I would really like to be able to identify that central cluster around y=1. The goal is to run this on multiple datasets. enter image description here

clustering • 791 views

ADD COMMENT • link updated 2.4 years ago by Mensur Dlakic ★ 28k • written 2.4 years ago by benformatics 4.0k

1

Entering edit mode

What is x and y? What kind of data is that? Did you try a graph-based clustering based on these two dimensions? So building a KNN/SNN graph first and then cluster that with igraph (e.g. louvain)?

ADD REPLY • link 2.4 years ago by ATpoint 85k

0

Entering edit mode

I like your suggestions, I will try that. It's gene coverage at specific coordinates.

ADD REPLY • link 2.4 years ago by benformatics 4.0k

score 0 · Answer 1 · 2022-07-05

I think Gaussian mixture models will work well on this type of scatter, though you will likely end up with more than two clusters. If you provide [X, Y] coordinates for data points, I could tell you for sure.

You could literally plug in your data into a script below instead of random points they generate:

https://scikit-learn.org/stable/auto_examples/mixture/plot_gmm.html