Question

How to cluster microbial samples after principal coordinate analysis based on beta-diversity

0

Entering edit mode

7 months ago

Bertalan_Takacs ▴ 100

Hi!

I am working on the analysis of microbial data (relative abundances) coming from a large cohort. I have a matrix of Bray-Curtis distances of our samples, and I've ran PCOA on the distance matrix. Our samples seem to cluster into two main cluster, interestingly this clustering doesn't seem to be led by any metadata feature we are aware of. Our plot looks like this enter image description here

I would like to define the two clusters using k-means (or some other clustering method, k-means just seems to be popular for this kind of analysis), draw the cluster borders on the plot and get a list for the samples belonging to each cluster. Currently I feel a bit stuck at this step. Is there an R or Python package for such analysis that would make my work easier?

Thanks in advance!

clustering bray-curtis microbiome pcoa • 426 views

ADD COMMENT • link updated 7 months ago by Mensur Dlakic ★ 28k • written 7 months ago by Bertalan_Takacs ▴ 100

score 2 · Answer 1 · 2024-05-31

Scikit-learn package in python has many clustering methods that would work out-of-box on your data:

https://scikit-learn.org/stable/modules/clustering.html

In addition to k-means where you specify the number of clusters, you can let the algorithm decide on the most optimal cluster number. I recommend these two: