Question

Why do UMAP on all scRNA-seq samples rather than a UMAP for each treatment?

2

Entering edit mode

3.3 years ago

rtrende ▴ 80

When analyzing scRNA-seq data, why do people pool all their data across treatments and run UMAP on the combined dataset rather than running a separate UMAP on each treatment group? For example, say you're looking at scRNA-seq data on immune cells from mice that did and did not receive some immunotherapy treatment. If you're expecting cell types to transcriptionally change in response to treatment, wouldn't it make more sense to run UMAP on just cells that share the same treatment? And yet in every paper I see, people pool their data across treatments and run UMAP on all their data. I'm sure there's a logical reason people do this rather than what I suggested, just curious what it is.

scRNA-seq umap • 2.0k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 3.3 years ago by rtrende ▴ 80

0

Entering edit mode

"If you're expecting cell types to transcriptionally change in response to treatment, wouldn't it make more sense to run UMAP on just cells that share the same treatment?"

No, please don't do this. If a population of Th2 cells changes in expression between immunotherapy treatment and control, why would you put them on different plots? How will you see that they're different?

ADD REPLY • link 3.3 years ago by dsull ★ 7.6k

1

Entering edit mode

You could put them on different plots because that would increase the purity of each cluster, because mixing Th2 cells that received different treatments would give you larger, more heterogeneous clusters that are more likely to include non-Th2s by mistake. Once you have Th2 clusters from the different UMAP plots, you could do, say, differential expression analysis between those two groups of cells to see any differences.

Because nobody does this, I'm sure there's a reason not to do it, I just want to know what that reason is.

ADD REPLY • link 3.0 years ago by rtrende ▴ 80

score 1 · Answer 1 · 2022-03-09

1

Entering edit mode

3.3 years ago

Mensur Dlakic ★ 29k

If all the cells are from the same tissue and from the same treatment, the UMAP plot should theoretically be a tight cluster of points. In practice it is usually not that tight because of normal cell stochasticity, experimental error, and other reasons, but it should still be a single cluster. I would expect multiple clusters from the same treatment only if the cells were intrinsically different, for example if they were from different tissues.

When the cells are from different treatments, the separation of points is expected to come from their differential responses to those treatments. That goes both for cells of the same tissue and from different tissues.

ADD COMMENT • link 3.3 years ago by Mensur Dlakic ★ 29k

1

Entering edit mode

I see where you're coming from. But that doesn't seem to me to address why not to do multiple UMAPs. To make my train of thought more explicit:

Like you said, because of biological and technical variability, UMAP clusters can get a little loose
If you include cells with different treatments in the same UMAP, there's more variability, and your clusters will be looser and thus less clean
Therefore, if you made a UMAP for each treatment, you'd decrease the variability within each UMAP, get cleaner clusters with fewer cells misclassified, and as a result your downstream analysis would be more accurate

I know nobody does this so I'm sure there's some flaw in my above logical train, I just want to know what it is

ADD REPLY • link 3.0 years ago by rtrende ▴ 80