Question

How to evaluate the MMseqs2 Clusters?

2

Entering edit mode

15 months ago

Bioinformatics_begginner ▴ 20

Hi!

I have a fasta file with large proteins of a different bacteria. I clustered using MMSeq2 default mode like:

mmseqs easy-cluster bacteria.fasta clusterResBacteria tmp --min-seq-id 0.5 -c 0.8 --cov-mode 1

From this I got, three different files:

clusterRes_allseqs.fasta clusterRes_cluster.fasta clusterRes_rep_seq.fasta

I managed to reduced the number of proteins by 80% and half of the clusters were singleton clusters (one element clusters).

I would like evaluate this clusters by any metric, such as AUC, or Intracluster distance or Intercluster distance. Having read the documentation of this method: https://mmseqs.com/latest/userguide.pdf , I found no method to obtain directly these results (like in Consensus Clustering ( for gene expression) https://bioconductor.org/packages/devel/bioc/vignettes/ConsensusClusterPlus/inst/doc/ConsensusClusterPlus.pdf , that outputs the graphs directly) or method to get the consensus matrix. I would like to know if the number of clusters or quality of clusters is good.

Thank you and best regards.

MMSeq FASTA Clustering Protein MMseqs2 • 1.6k views

ADD COMMENT • link 15 months ago by Bioinformatics_begginner ▴ 20