Hi!
I have a fasta file with large proteins of a different bacteria. I clustered using MMSeq2 default mode like:
mmseqs easy-cluster bacteria.fasta clusterResBacteria tmp --min-seq-id 0.5 -c 0.8 --cov-mode 1
From this I got, three different files:
clusterRes_allseqs.fasta clusterRes_cluster.fasta clusterRes_rep_seq.fasta
I managed to reduced the number of proteins by 80% and half of the clusters were singleton clusters (one element clusters).
I would like evaluate this clusters by any metric, such as AUC, or Intracluster distance or Intercluster distance. Having read the documentation of this method: https://mmseqs.com/latest/userguide.pdf , I found no method to obtain directly these results (like in Consensus Clustering ( for gene expression) https://bioconductor.org/packages/devel/bioc/vignettes/ConsensusClusterPlus/inst/doc/ConsensusClusterPlus.pdf , that outputs the graphs directly) or method to get the consensus matrix. I would like to know if the number of clusters or quality of clusters is good.
Thank you and best regards.