Question

scRNA-seq downstream analysis on raw vs batch corrected counts

0

Entering edit mode

14 months ago

codezero • 0

I am a student and am performing downstream snRNA-seq analysis (differential expression, amongst other things) on data gathered from multiple batches.

I am confused as to whether I should perform this on either raw counts (normalized and logarithmized) vs corrected counts (batch correction with harmony or scanorama for example). Generally, I believe this is done on raw counts. However, I don't understand why batch effects wouldn't skew the results? Isn't the whole point to remove the unwanted technical variation so that we can search for the biological variability, but performing downstream analysis on raw counts would include technical variability in the findings.

What am I missing?

Thanks!

scRNA-seq batch-correction • 991 views

ADD COMMENT • link 14 months ago by codezero • 0

score 3 · Accepted Answer · 2024-02-08

You're hitting imo the most misunderstood concept in single-cell RNA-seq:

Yes, the batch correction by harmony aims to remove unwanted technical variation but on a per-cell level. Meaning, the correction aims to put the cell as a whole into a corrected space, so that you can do clustering, UMAP and things like that without batch effects.

That is not the same as "traditional" batch regression where you're directly correcting the expression level of each gene, so a per-gene level technique.

You cannot and should not use the corrected counts for any quantitative per-gene comparison, see:

https://bioconductor.org/books/release/OSCA.multisample/using-corrected-values.html

So yes, downstream analysis that does per-gene analysis (e.g. differential expression) would be done on the non-corrected counts, and you need to adjust for batch effects here, e.g. as part of your linear model (assumes that it's not confounded). Other analysis, such as PCA/UMAP would be done on the corrected values, as it's per-cell, not per-gene.