I have a new dataset of Single-nuclei RNA sequencing data from mouse brain tissue in two different conditions (3 treated vs 3 not treated) that I'm analyzing using Seurat. I am trying to run a differential expression analysis for each subtype between the conditions after annotation, but some of the subtypes have a different number of cells in each condition (30 cells in the treated vs 350 in the non-treated in some subtypes for example).
I usually perform pseudobulk before running the DE using DESeq2, but I'm afraid that the imbalance in the number of cells between conditions might be driving the results I am seeing, as the Seurat::AggregateExpression function just aggregates the expression without taking into account the number of cells.
What would people recommend to do in these situations? Should I use a different pseudobulking method? I followed the recommended option by the Seurat vignette, but I'm the only one in my lab doing this analysis and I am not sure that this is the most appropriate method. Thank you very much!
Thanks! I just realized that grouping the cells as a major subtype or into the minor subtypes that make up the major does change the number of differentially expressed genes between conditions in these groups.
Is there any way to determine which level of grouping is the most optimal for comparisons?