Question

Integration of scRNA-seq datasets from different tissues

0

Entering edit mode

8 months ago

Immreg • 0

I have three datasets of three different tissues coming from the same donor which were processed the same and sequenced on the same platform. I am interested in exploring potential differences in the cell subsets and their expression profiles in these different tissues. Following the various scRNA-seq analysis tutorials, working with multiple datasets usually calls for integration. However, if I understand correctly, the point of integrating different datasets is to eliminate batch effects depending on the covariate.

I was wondering whether it makes sense at all in my case to perform data integration since each “batch” that I have is a different tissue - so a covariate that I do not want to eliminate since I am interested in finding variations arising from that. At the same time, I understand that despite a similar protocol and same sequencing platform, there still could be unwanted technical variations. Though, if I do integrate these datasets, biological variations (due to tissue differences) would automatically also be eliminated, correct?

What would be the correct workflow in a scenario like this? Should I first perform the analysis without integration and unless I clearly see tissue-specific clusters, I do not need to integrate at all?

scRNA-seq • 744 views

ADD COMMENT • link updated 7 months ago by OmnibusX ▴ 100 • written 8 months ago by Immreg • 0

score 2 · Answer 1 · 2024-08-15

I have worked with identifying cell subtypes across multiple studies and tissues. In most cases, I observed that labeling cell subtypes independently and then combining them to perform comparison analyses, such as comparing the expression of blood NK cells versus tissue-resident NK cells, works best. The batch-corrected process usually mixes cell subtypes together, whereas analyzing each study alone allows them to form distinct clusters. In these cases, batch correction only serves to visualize cell subtype clusters together.

However, I have observed cases where the cell subtype population is too small to be noticeable in a single study. Only when I combine enough studies that contain these subtypes does the resulting number of cells become significant enough to form a small, human-detectable cluster. In these cases, I only extract and combine cells from the same major cell types to avoid over-correction due to the presence of different major cell types.

Hope my experience can be of help.

score 1 · Answer 2 · 2024-07-23

This depends on what your intended analysis would be. You said you're interested in cell subsets and you wouldn't really need to integrate your data in this case. You could indentify cell types in your separate tissues. If you wanted to do some kind of unsupervised analysis of clusters in each tissue, you could find marker genes for clusters and compare the clusters between tissues using something like jaccard similarity.

You could also combine the three tissues and try and normalise and regress for covariates like mitochondrial %, cell cycle etc with and without integration and see what it looks like.