Question

single-cell data: Batch effects still there after integration

0

Entering edit mode

6 months ago

Penny • 0

Hi everyone, I'm dealing with single cell data from iPSC. They are basically microglia. I have four samples: KO1, WT1, KO3, WT3. They were done in two batches, those '1's are in batch 1, and '3's are in batch 2.

From the first UMAP (unintegrated) I detected batch effects as well, some marker genes heatmap is also showed. heatmap (unintegrated)

So I tried different integration methods provided in Seurat (CCA, Harmony, and FastMNN) to integrate by batch. Below is UMAP and heatmap after FastMNN on the integrated data. enter image description here

What i found confusing is that after the integration, the heatmap is showing some inconsistent patterns in, for example, cluster 0. And even the unintegrated heatmap doesnt show this distinct patterns within a single cluster. All the integration methods gave me similar results.

I wonder if I need to do integration in my case, or why the pattern is like this, and if there are other better integration methods i should try.

Thank you so much for any information.

single-cell integration batch-effect • 1.3k views

ADD COMMENT • link 6 months ago by Penny • 0

2

Entering edit mode

I will assume your dataset is scRNA-seq as it is not mentionned in your post. Could you share your command lines and your Seurat version as well ?

What is your knock out experiment ? Are you anticipating a huge change in microglia gene expression ?

Are you sure you haven't swap KO1 and WT3, when labeling your cells ?

Your integration is too harsh, it mitigates both your batch effect and your condition effect.

The inconsistant heatmap pattern is coming from the plotting of untransformed counts (data or scale.data layers) based on integrated clustering. The integration only helps in finding consistant clusters between sample. For downstream analysis one should stay as close as possible to unmodified counts. If you set your layer on the integrated matrix to draw your heatmap, you will see your classic marker patterns for each cluster.

ADD REPLY • link 6 months ago by Bastien Hervé 6.4k

0

Entering edit mode

Hi Bastien, Thank you so much for your reply. Really appreciate it. I should have mentioned more info.

Yes, I'm using scRNAseq data. The experiment is knocking out a gene that we are interested in. We are not so sure whether there would be a huge change in microglia gene expression between knock out vs. ctrl. But we anticipate difference between batch 1 and batch 2.

Here are the R codes I used for no-integration / integration step and heatmap showing marker genes.

I'm using Seurat V5.

package.version('Seurat')
[1] "5.1.0"


seu[['RNA']]<- split(seu[['RNA']], f=seu$batch)
seu<- NormalizeData(seu)
seu<- FindVariableFeatures(seu)
seu<- ScaleData(seu)
seu<- RunPCA(seu)

#------------------------
#' no integration version
#' ----------------------

seu <- FindNeighbors(seu, dims = 1:30, reduction = "pca")
seu <- FindClusters(seu, resolution = 1.2, cluster.name = "unintegrated_clusters")
seu <- RunUMAP(seu, dims = 1:30, reduction = "pca", reduction.name = "umap.unintegrated")
DimPlot(seu, reduction = "umap.unintegrated", group.by = c("sample", "unintegrated_clusters"), label = T)

seu <- JoinLayers(seu)
seu

#------------------------
#'integration version
#' ----------------------
seu <- IntegrateLayers(
object = seu, method = FastMNNIntegration,
orig.reduction = "pca", new.reduction = "FastMNNIntegration",
verbose = FALSE
)

seu <- FindNeighbors(seu, reduction = "FastMNNIntegration", dims = 1:30)
seu <- FindClusters(seu, resolution = 0.8, cluster.name = "FastMNNIntegration_clusters")
seu <- RunUMAP(seu, reduction = "FastMNNIntegration", dims = 1:30, reduction.name = "FastMNNIntegration")

p1<- DimPlot(
  seu,
  reduction = "FastMNNIntegration",
  group.by = c("FastMNNIntegration_clusters"),
  combine = FALSE, label.size = 3, label = T
)

p2<- DimPlot(
      seu,
      reduction = "FastMNNIntegration",
      group.by = c("sample"),
      combine = FALSE, label.size = 3, label = T
    )

seu <- JoinLayers(seu)
seu

#----------------------------------
# markers genes by heatmap
#----------------------------------

DefaultAssay(seu) ##RNA
all.genes <- rownames(seu)
seu <- ScaleData(seu, features = all.genes)

pdf('feature_heatmap.pdf', width = 20, height = 12)
DoHeatmap(seu, features = my.mark.clps1 )
graphics.off()

I'm positive that there is no label swapping.

And thank you for pointing out the reasons for the inconsistent patterns in heatmap. Now I know how it comes. However, I learned that Seurat V5 no longer provides an integrated/corrected matrix but an integrated dimensional reduction embedding, so I only have RNA assay. In that case, how would I show marker genes in heatmap? Or if i should switch to Seurat V4 which provides integrated assay.

Maybe a more important question is in, as you said, the integration step. Previously in other projects, I still used RNA assay for heatmap after integration but did not see such inconsistent patterns. So I assume the current heatmap may indicate that my data analysis is not correct. You mentioned my integration is too harsh, I would love to learn if you have any suggestions on how I should do the integration instead?

Thank you so much for your time and input!

ADD REPLY • link 6 months ago by Penny • 0

2

Entering edit mode

If your labels are not swapped, from your first figure, one can assume that your batch effect is higher than your KO effect, as the different conditions overlap quite well withing the same batch and that the 2 batches are not overlapping whatever the condition.

I always found Harmony to be a bit more subtle on batch effect correction while integration much more harsh. Integration is fine if you are looking for obvious change of expression like different cell types or disease/control conditions. In KO experiment sometimes one does not see any gene expression fluctuation or little, which will be completely masked if you integrate.

You can try this :

seu <- RunPCA(seu)
seu <- RunHarmony(object = seu, group.by.vars = "batch", theta = 2, lambda = 1, sigma = 0.1)
seu <- RunUMAP(seurat_obj_harm, reduction = "harmony")

Play around with the theta, lambda and sigma parameters to adjust the strength of the correction.

Seurat V5 no longer provides an integrated/corrected matrix

Well I think that is a "good thing", it will prevent bad practice on using this new integrated count matrix for downstream analysis.

After integration or Harmony reduction your count matrix stay the same, only the dimensional reduction is corrected, so you get "better" clustering but the count matrix stays the same. One can skew the count matrix for visualization but not for count analysis.

how would I show marker genes in heatmap ?

Like you did with the "weird" pattern, you can add at the top the different sample labels annotation on top of the clusters.

I still used RNA assay for heatmap after integration but did not see such inconsistent patterns

If you used your unintegrated_clusters you will not see the patterns, if you used your FastMNNIntegration_clusters you will see the patterns.

If you want to find gene markers specific the your KO condition while correcting for your batch effect, you should try a pseudobulk analysis approach using DESeq2, EdgeR or Limma-voom, including your batch effect in your design.

ADD REPLY • link 6 months ago by Bastien Hervé 6.4k

0

Entering edit mode

Thank you, Bastien, for the very detailed instructions! I will try and follow. Thanks for your time and input!

ADD REPLY • link 6 months ago by Penny • 0