When analyzing scRNA-seq data, why do people pool all their data across treatments and run UMAP on the combined dataset rather than running a separate UMAP on each treatment group? For example, say you're looking at scRNA-seq data on immune cells from mice that did and did not receive some immunotherapy treatment. If you're expecting cell types to transcriptionally change in response to treatment, wouldn't it make more sense to run UMAP on just cells that share the same treatment? And yet in every paper I see, people pool their data across treatments and run UMAP on all their data. I'm sure there's a logical reason people do this rather than what I suggested, just curious what it is.
"If you're expecting cell types to transcriptionally change in response to treatment, wouldn't it make more sense to run UMAP on just cells that share the same treatment?"
No, please don't do this. If a population of Th2 cells changes in expression between immunotherapy treatment and control, why would you put them on different plots? How will you see that they're different?
You could put them on different plots because that would increase the purity of each cluster, because mixing Th2 cells that received different treatments would give you larger, more heterogeneous clusters that are more likely to include non-Th2s by mistake. Once you have Th2 clusters from the different UMAP plots, you could do, say, differential expression analysis between those two groups of cells to see any differences.
Because nobody does this, I'm sure there's a reason not to do it, I just want to know what that reason is.