How to evaluate the MMseqs2 Clusters?
0
2
Entering edit mode
14 months ago

Hi!

I have a fasta file with large proteins of a different bacteria. I clustered using MMSeq2 default mode like:

mmseqs easy-cluster bacteria.fasta clusterResBacteria tmp --min-seq-id 0.5 -c 0.8 --cov-mode 1

From this I got, three different files:

clusterRes_allseqs.fasta clusterRes_cluster.fasta clusterRes_rep_seq.fasta

I managed to reduced the number of proteins by 80% and half of the clusters were singleton clusters (one element clusters).

I would like evaluate this clusters by any metric, such as AUC, or Intracluster distance or Intercluster distance. Having read the documentation of this method: https://mmseqs.com/latest/userguide.pdf , I found no method to obtain directly these results (like in Consensus Clustering ( for gene expression) https://bioconductor.org/packages/devel/bioc/vignettes/ConsensusClusterPlus/inst/doc/ConsensusClusterPlus.pdf , that outputs the graphs directly) or method to get the consensus matrix. I would like to know if the number of clusters or quality of clusters is good.

Thank you and best regards.

MMSeq FASTA Clustering Protein MMseqs2 • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 3552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6