Hi everyone,
I am recently analyzing a scRNA-seq dataset X. After clustering and annotation, I have my landscape A.
Next, I want to compare my scRNA-seq clusters with a public dataset Y. I have tried two methods,
- One is directly integrating datasets X and Y using Seurat
IntegrateData
function, I think this is the most direct way for comparison. But I will get a new landscape B, I need to reannotate the new landscape and map the original annotation of X and Y to the new landscape B. - Another method is to use
AddModuleScore
. For example, using the top 50 markers for cluser1 in landscape A, I can calculate the module score of each cluster in dataset Y, but this is somehow not accurate because many clusters show a high module score.
Recently, I read several papers introducing the "label transfer" method, including Scanorama, scNym. But several benchmark studies compared Seurat
and Hormany
with these label transfer methods, so I'm confused about how the integration method
differ from label transfer methods.
I would appreciate it much if anyone helps to discuss this question and also, I would like to see how others doing such jobs.
Thank you very much.
I think it completely depends on the question you are asking to the data in your hand.
If I am interested to check where the cell state from the publicly available datasets is enriched in my dataset, I would go for your strategy 2. Simply, take the top DEG list (top 50 OR based on significance & log2FC) from a public dataset, calculate the score, and plot them over the UMAP of my dataset.
If I am interested in cell type annotation based on some atlas OR well-curated reference datasets (public dataset) I would go for label transfer OR using cell type annotation tool using a public dataset as a reference.
If I need to increase the number of cells OR if I am interested in trajectory analysis, I would go for dataset integration. I will have the option to dissect the population of my interest in both datasets (similar cell states), which I can subset and build a trajectory, and can perform downstream analysis on an integrated dataset.
Regards,
Nitin N.
Thank you, Narwade, for your kind and clear reply!
There is another scenario that I what to discuss further. I have a control dataset, a drug A response dataset, and a drug B response dataset.
I know it will be very intuitive and straightforward to integrate these 3 datasets directly in the beginning and do the comparison. However, there will be some minor changes due to drug A treatment being driven/covered with drug B dataset integrated.
Or if you have other methods for this problem.
Thank you very much!