single-cell data: Batch effects still there after integration
0
0
Entering edit mode
1 day ago
Penny • 0

Hi everyone, I'm dealing with single cell data from iPSC. They are basically microglia. I have four samples: KO1, WT1, KO3, WT3. They were done in two batches, those '1's are in batch 1, and '3's are in batch 2.

From the first UMAP (unintegrated) I detected batch effects as well, some marker genes heatmap is also showed. UMAP (unintegrated) heatmap (unintegrated)

So I tried different integration methods provided in Seurat (CCA, Harmony, and FastMNN) to integrate by batch. Below is UMAP and heatmap after FastMNN on the integrated data. enter image description here enter image description here enter image description here

What i found confusing is that after the integration, the heatmap is showing some inconsistent patterns in, for example, cluster 0. And even the unintegrated heatmap doesnt show this distinct patterns within a single cluster. All the integration methods gave me similar results.

I wonder if I need to do integration in my case, or why the pattern is like this, and if there are other better integration methods i should try.

Thank you so much for any information.

single-cell integration batch-effect • 159 views
ADD COMMENT
0
Entering edit mode

I will assume your dataset is scRNA-seq as it is not mentionned in your post. Could you share your command lines and your Seurat version as well ?

What is your knock out experiment ? Are you anticipating a huge change in microglia gene expression ?

Are you sure you haven't swap KO1 and WT3, when labeling your cells ?

Your integration is too harsh, it mitigates both your batch effect and your condition effect.

The inconsistant heatmap pattern is coming from the plotting of untransformed counts (data or scale.data layers) based on integrated clustering. The integration only helps in finding consistant clusters between sample. For downstream analysis one should stay as close as possible to unmodified counts. If you set your layer on the integrated matrix to draw your heatmap, you will see your classic marker patterns for each cluster.

ADD REPLY
0
Entering edit mode

Hi Bastien, Thank you so much for your reply. Really appreciate it. I should have mentioned more info.

Yes, I'm using scRNAseq data. The experiment is knocking out a gene that we are interested in. We are not so sure whether there would be a huge change in microglia gene expression between knock out vs. ctrl. But we anticipate difference between batch 1 and batch 2.

Here are the R codes I used for no-integration / integration step and heatmap showing marker genes.

I'm using Seurat V5.

package.version('Seurat')
[1] "5.1.0"


seu[['RNA']]<- split(seu[['RNA']], f=seu$batch)
seu<- NormalizeData(seu)
seu<- FindVariableFeatures(seu)
seu<- ScaleData(seu)
seu<- RunPCA(seu)

#------------------------
#' no integration version
#' ----------------------

seu <- FindNeighbors(seu, dims = 1:30, reduction = "pca")
seu <- FindClusters(seu, resolution = 1.2, cluster.name = "unintegrated_clusters")
seu <- RunUMAP(seu, dims = 1:30, reduction = "pca", reduction.name = "umap.unintegrated")
DimPlot(seu, reduction = "umap.unintegrated", group.by = c("sample", "unintegrated_clusters"), label = T)

seu <- JoinLayers(seu)
seu

#------------------------
#'integration version
#' ----------------------
seu <- IntegrateLayers(
object = seu, method = FastMNNIntegration,
orig.reduction = "pca", new.reduction = "FastMNNIntegration",
verbose = FALSE
)

seu <- FindNeighbors(seu, reduction = "FastMNNIntegration", dims = 1:30)
seu <- FindClusters(seu, resolution = 0.8, cluster.name = "FastMNNIntegration_clusters")
seu <- RunUMAP(seu, reduction = "FastMNNIntegration", dims = 1:30, reduction.name = "FastMNNIntegration")

p1<- DimPlot(
  seu,
  reduction = "FastMNNIntegration",
  group.by = c("FastMNNIntegration_clusters"),
  combine = FALSE, label.size = 3, label = T
)

p2<- DimPlot(
      seu,
      reduction = "FastMNNIntegration",
      group.by = c("sample"),
      combine = FALSE, label.size = 3, label = T
    )

seu <- JoinLayers(seu)
seu

#----------------------------------
# markers genes by heatmap
#----------------------------------

DefaultAssay(seu) ##RNA
all.genes <- rownames(seu)
seu <- ScaleData(seu, features = all.genes)

pdf('feature_heatmap.pdf', width = 20, height = 12)
DoHeatmap(seu, features = my.mark.clps1 )
graphics.off()

I'm positive that there is no label swapping.

And thank you for pointing out the reasons for the inconsistent patterns in heatmap. Now I know how it comes. However, I learned that Seurat V5 no longer provides an integrated/corrected matrix but an integrated dimensional reduction embedding, so I only have RNA assay. In that case, how would I show marker genes in heatmap? Or if i should switch to Seurat V4 which provides integrated assay.

Maybe a more important question is in, as you said, the integration step. Previously in other projects, I still used RNA assay for heatmap after integration but did not see such inconsistent patterns. So I assume the current heatmap may indicate that my data analysis is not correct. You mentioned my integration is too harsh, I would love to learn if you have any suggestions on how I should do the integration instead?

Thank you so much for your time and input!

ADD REPLY

Login before adding your answer.

Traffic: 2783 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6