I have scRNA-seq data from 12 samples across 3 conditions, with 4 samples per condition. After integrating and clustering the data, I observed that cells in cluster X predominantly originate from condition 1. Specifically, cluster X contains 1000 cells in total—300 cells each from samples 1, 2, and 3 of condition 1, but only 100 cells from sample 4 of the same condition.
How can I statistically test whether cluster X is significantly associated with condition 1, despite one sample contributing relatively few cells to this cluster?