Hello,
I am new at analyzing Single-cell data. I am currently working on a two group comparison with 4 biological replicates each. I followed the standard protocol. NormalizeData, FIndVariableFeatures(2000), FindIntegrationAnchors, IntegrateData, ScaleData, RunPCA, RunUMAP, FIndNeighbors, and FindClusters. After defining my clusters, I run FindMarkers with default settings. The output is where I find myself lost. I have filtered Average log2FC by >0.25 or <-0.25, min.pct of 0.25 and padj<0.00001. I still have 14000 genes to be significantly different between groups. How can I approach that number? is it usual to see a high number? am I missing something in my analysis? I would appreciate any suggestions and advice on this. Thank you.
Thank you for the useful information, I will dig deeper on it.
I've tried pseudobulk analysis using DESEq2 and one of my clusters has only 44 DEGs (which seems fine) but the other one only 1 using a padj<0.05. The adjusted values increase rapidly to 0.998. I do take the batch inconsideration but only seems to increase the number of genes by 5-10. Do you have any suggestion? Thank you again for your input.
Nothing concrete, unfortunately. You can try some other methods (limma, edgeR), but you'll likely get pretty similar results. I'd try dropping the significance threshold to 0.1.
I have also noticed that very small clusters tend to have poor sensitivity, likely due to higher variability among the pseudobulks. So for clusters with <20 cells per sample, you may have a tough time.