Question

scRNA-seq dataset comparison

0

Entering edit mode

17 months ago

Hughie ▴ 30

Hi everyone,
I am recently analyzing a scRNA-seq dataset X. After clustering and annotation, I have my landscape A.
Next, I want to compare my scRNA-seq clusters with a public dataset Y. I have tried two methods,

One is directly integrating datasets X and Y using Seurat IntegrateData function, I think this is the most direct way for comparison. But I will get a new landscape B, I need to reannotate the new landscape and map the original annotation of X and Y to the new landscape B.
Another method is to use AddModuleScore. For example, using the top 50 markers for cluser1 in landscape A, I can calculate the module score of each cluster in dataset Y, but this is somehow not accurate because many clusters show a high module score.

Recently, I read several papers introducing the "label transfer" method, including Scanorama, scNym. But several benchmark studies compared Seurat and Hormany with these label transfer methods, so I'm confused about how the integration method differ from label transfer methods.

I would appreciate it much if anyone helps to discuss this question and also, I would like to see how others doing such jobs.
Thank you very much.

transfer integration label scRNA-seq data • 1.2k views

ADD COMMENT • link updated 16 months ago by Nitin Narwade ★ 1.6k • written 17 months ago by Hughie ▴ 30

0

Entering edit mode

I think it completely depends on the question you are asking to the data in your hand.

If I am interested to check where the cell state from the publicly available datasets is enriched in my dataset, I would go for your strategy 2. Simply, take the top DEG list (top 50 OR based on significance & log2FC) from a public dataset, calculate the score, and plot them over the UMAP of my dataset.
If I am interested in cell type annotation based on some atlas OR well-curated reference datasets (public dataset) I would go for label transfer OR using cell type annotation tool using a public dataset as a reference.
If I need to increase the number of cells OR if I am interested in trajectory analysis, I would go for dataset integration. I will have the option to dissect the population of my interest in both datasets (similar cell states), which I can subset and build a trajectory, and can perform downstream analysis on an integrated dataset.

Regards,

Nitin N.

ADD REPLY • link 17 months ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Thank you, Narwade, for your kind and clear reply!
There is another scenario that I what to discuss further. I have a control dataset, a drug A response dataset, and a drug B response dataset.

Normally, I will first integrate control with drug A dataset to get a UMAP_A and check cell percentage changes due to drug A treatment. Some experiments will be conducted to validate this.
When it comes to the drug B section in paper, I will integrate control with drug B dataset to get another UMAP__B, then check cell percentage changes due to drug B treatment. However, the cell types for the same control dataset will be annotated slightly differently in two UMAPs. I wonder, in this case, whether I can manually map the control annotation from UMAP__A to UMAP_B by checking the cell ID.

I know it will be very intuitive and straightforward to integrate these 3 datasets directly in the beginning and do the comparison. However, there will be some minor changes due to drug A treatment being driven/covered with drug B dataset integrated.
Or if you have other methods for this problem.
Thank you very much!

ADD REPLY • link 16 months ago by Hughie ▴ 30

score 0 · Answer 1 · 2023-07-09

As you said the straight-forward way is to integrate 3 conditions and perform downstream analysis. But I think in your case (based on the requirement) reference-query mapping approach would be more appropriate.

Create a reference map using control sample, annotate it properly and then map each of condition (drug A, drug B) on this reference.

In this approach you do not have to find pair-wise anchors in each datasets as we do in the integration, so the differences and similarities will be calculated with your control sample.

Mapping and annotating query datasets

NOTE: This response is purely based on logic and I did not try this anywhere in my projects.

Regards,

Nitin N.