Question

How do I know if I'm overcorrecting my single cell data

0

Entering edit mode

2.3 years ago

paria ▴ 100

Hello all, I have a sn-RNA seq dataset of 20 samples (ALS and controls) which I used seurat to integrate. In my committee meeting, I was told to check if I'm overcorrecting my data by adding an unrelated sample and check if it mix with other samples or not. I did integrate my samples with one extra sample, pbmc. Below is the UMAP plot of the mentioned dataset+pbmc:

enter image description here

I conclude that the pbmc ,which is not supposed to mix with other samples, is mixed. So, Seurat integration overcorrecting may dataset. I read that Seurat integration CCA tend to do so and using reciprocal PCA would mitigate this over-correction. So, I integrated my dataset using Seurat Rpca. And below is the same UMAP plot integrating same dataset (ALS & control+pbmc). enter image description here

In this integration it seems that there is not as many celll as the previous one. However, the number is equal. Based on this I concluded that using reciprocal pea is a better approach. However, when I did clustering there is not much difference in terms of mix of pbmc with other samples. The only difference is that in the first umap the pbmc spread widely across clusters as the number of pbmc in each cluster says. but in the second one it is mixed with fewer clusters. But in neither of them I don't see pbmc clustering separately from other dataset. My questions are as below: First, does that make sense to conclude that the second one is less overcorrected compared to the first approach? Second, is it a proper way of evaluation of overcorrection? Third, is there any more principled approach to evaluate overcorrection? FYI, I did try harmony integration as well but didn't end up using it as I did not have enough stable clusters. There is a more principled approach to determine if the data is overcorrected here http://bioconductor.org/books/3.15/OSCA.multisample/correction-diagnostics.html#preserving-biological-heterogeneity However, it is reliable if the samples include same cell types. In my case it does not work. I really appreciate any comment on my questions. Thanks, Paria

Rpca sn-RNA overcorrection seq seurat CCA • 2.6k views

ADD COMMENT • link updated 2.3 years ago by jared.andrews07 ★ 18k • written 2.3 years ago by paria ▴ 100

0

Entering edit mode

I think it makes no sense to combine these data. It's completely different celltypes on top of the batch effect based on different studies. Why do you want to do that anyway, so assuming PBMCs and your data were created perfectly in the same experiment, which analysis would you do on that?

ADD REPLY • link 2.3 years ago by ATpoint 87k

0

Entering edit mode

Thanks for your response. I don't need to study integration of pbmc+my dataset. It was just to see if my data is overcorrected. I mean because pbmc is a very different dataset I expect to see a different island in my map plot. However, it is not clustering separately.

ADD REPLY • link 2.3 years ago by paria ▴ 100

score 1 · Answer 1 · 2022-11-09

1

Entering edit mode

2.3 years ago

jared.andrews07 ★ 18k

You're trying to apply methods that expect at least some overlap between datasets to datasets that have no overlap. I don't find it surprising that it's taking its best guess to cram things together, as these methods really require some common ground to work effectively.

A better approach may be to use a "leave one out" method whereby you remove a given cell type in one sample, integrate, and show that cell type still stands alone in the other sample. This is easy to do and does a reasonable job answering your committee member's question. Do this with multiple cell types for increased confidence.

ADD COMMENT • link 2.3 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thanks for your response. I think it makes sense what you recommended. However, after removing one cell type from a sample as I have other 19 samples I think more probably what you mention will happen and they stand alone. do you think it makes more sense to remove one cell type from 19 (out of 20 sample) and see if those few cell types still stand alone? Moreover, when you say remove one cell type it will be after integration and clustering. I was wondering how I could remove specific cell type and go back to do integration from scratch? Thanks, Paria

ADD REPLY • link 2.3 years ago by paria ▴ 100

0

Entering edit mode

do you think it makes more sense to remove one cell type from 19 (out of 20 sample) and see if those few cell types still stand alone?

Yes, that is what I'd try.

Moreover, when you say remove one cell type it will be after integration and clustering. I was wondering how I could remove specific cell type and go back to do integration from scratch?

Just re-run the same steps on your Seurat object after adding cell type labels and subsetting. The labels should be retained.

ADD REPLY • link 2.3 years ago by jared.andrews07 ★ 18k