Question

is it correct to normalize batch corrected data?

0

Entering edit mode

23 months ago

star ▴ 350

I have gathered Single-cell RNA-seq data from various studies, and to address batch effects among them, I applied the scGen method. This method was chosen because it resulted in improved clustering and provided a tabular output.

Now, prior to proceeding with the calculation of Expression levels (average expression for each gene across cells), I have the following questions:

Is it appropriate to normalize the data NormalizeData() after batch correction? I believe we need to do it since batch correction is for unwanted variables/ technical variables, not library size.
The scGen method yields negative values (likely due to its requirement of log-normalized data for training the model). Should I be concerned about these negative values, or are they inconsequential given that my goal is to obtain a normalized count table?

I really appreciate any help you can provide!

batch-correction single-cell RUV scGen nomalization • 1.6k views

ADD COMMENT • link 23 months ago by star ▴ 350

score 2 · Answer 1 · 2023-08-10

2

Entering edit mode

23 months ago

bk11 ★ 3.1k

For your first part, you can check in the link provided what Sean Davis told before.

In Which Order Use Normalization And Batch Effects Removal?

ADD COMMENT • link 23 months ago by bk11 ★ 3.1k

0

Entering edit mode

nice find - but, feels bad man. miss sean davis being here.

ADD REPLY • link 23 months ago by LauferVA 4.8k

score 1 · Answer 2 · 2023-08-10

These integration methods return values that should not be used for anything other than clustering and dimensionality reduction. It is not a per-gene batch correction but a per-cell one, so these methods will happily change magnitude and sign of expression values to embed the cell properly into the corrected space. That means, the values per gene have essentially no meaning. You cannot compare them individually between cells or datasets.

One reference, which to my knowledge applied to all these integration methods is http://bioconductor.org/books/release/OSCA.multisample/using-corrected-values.html

I do not know the particular method you used, my answer might not apply in that particular case, bit probably it does. So no, you should not additionally normalize these values, not use them to compare expression.