Hi everyone, I'm dealing with single cell data from iPSC. They are basically microglia. I have four samples: KO1, WT1, KO3, WT3. They were done in two batches, those '1's are in batch 1, and '3's are in batch 2.
From the first UMAP (unintegrated) I detected batch effects as well, some marker genes heatmap is also showed.
So I tried different integration methods provided in Seurat (CCA, Harmony, and FastMNN) to integrate by batch. Below is UMAP and heatmap after FastMNN on the integrated data.
What i found confusing is that after the integration, the heatmap is showing some inconsistent patterns in, for example, cluster 0. And even the unintegrated heatmap doesnt show this distinct patterns within a single cluster. All the integration methods gave me similar results.
I wonder if I need to do integration in my case, or why the pattern is like this, and if there are other better integration methods i should try.
Thank you so much for any information.
I will assume your dataset is scRNA-seq as it is not mentionned in your post. Could you share your command lines and your Seurat version as well ?
What is your knock out experiment ? Are you anticipating a huge change in microglia gene expression ?
Are you sure you haven't swap KO1 and WT3, when labeling your cells ?
Your integration is too harsh, it mitigates both your batch effect and your condition effect.
The inconsistant heatmap pattern is coming from the plotting of untransformed counts (data or scale.data layers) based on integrated clustering. The integration only helps in finding consistant clusters between sample. For downstream analysis one should stay as close as possible to unmodified counts. If you set your layer on the integrated matrix to draw your heatmap, you will see your classic marker patterns for each cluster.
Hi Bastien, Thank you so much for your reply. Really appreciate it. I should have mentioned more info.
Yes, I'm using scRNAseq data. The experiment is knocking out a gene that we are interested in. We are not so sure whether there would be a huge change in microglia gene expression between knock out vs. ctrl. But we anticipate difference between batch 1 and batch 2.
Here are the R codes I used for no-integration / integration step and heatmap showing marker genes.
I'm using Seurat V5.
I'm positive that there is no label swapping.
And thank you for pointing out the reasons for the inconsistent patterns in heatmap. Now I know how it comes. However, I learned that Seurat V5 no longer provides an integrated/corrected matrix but an integrated dimensional reduction embedding, so I only have RNA assay. In that case, how would I show marker genes in heatmap? Or if i should switch to Seurat V4 which provides integrated assay.
Maybe a more important question is in, as you said, the integration step. Previously in other projects, I still used RNA assay for heatmap after integration but did not see such inconsistent patterns. So I assume the current heatmap may indicate that my data analysis is not correct. You mentioned my integration is too harsh, I would love to learn if you have any suggestions on how I should do the integration instead?
Thank you so much for your time and input!