Hi all!
I am analyzing a single-cell dataset consisting of one cell line in 4 conditions (control, treatment1, treatment2, combination). I integrated all the 4 samples using Seurat package and want to cluster it to further assess changes in clusters percentages between samples.
However, I don't have a specific resolution to choose. When clustering with different resolutions, I observed some clusters have the same functions (based on enrichment analysis) while others, apparently, combine several ones (i.e., mitotic division, oxphos and immune response). Depending on the resolution, cluster sizes with the (according to enrichment) same function could both increase and decrease when comparing treatment vs control. As I have suggestions on what's happening under the treatments (based on bulk-seq, proteomics and PCR data), the choice of optimal resolution based on functional analysis seemed biased to me. Furthermore, as it's not a cell population, I don't think to see some specific subpopulations based on markers, as with usual single-cell.
Then I tried to analyse the mean silhouette metric on different resolutions, but it was about 0.25 (weak clustering), and with the resolution increasing many clusters had negative silhouette values.
So I decided to implement an iterative clustering according to this scheme. I calculate the mean silhouette scores for each cluster for different resolutions, choose the cluster with the highest score, assign a new cluster and repeat the procedure without this one. I finish clustering when all of the subclusters have silhouette score under 0.5 (normal clustering).
As I didn't see nowhere this approach, I'm asking myself if it is mathematically and biologically correct?
Looking forward to your suggestions/replies! Thanks!