Question

Pseudobulk for scRNA-seq: sum or mean?

0

Entering edit mode

5 months ago

adele • 0

I am often looking at gene expression levels per sample and cell type. I am wondering what is actually the 'most appropriate' method. It is not for DGE analysis, but to visualise a certain gene of interest across conditions and find correlated genes.

I am used to using the Seurat R package for analysing scRNA-seq data. First, there was the function AverageExpression and I use the following formula based on this function to calculate the mean expression of normalised counts:

log1p(mean(expm1(expr)))

However, now there is the function AggregateExpression (sum counts) and based on my understanding this is the suggested method for DGE analysis, but what about other analyses (e.g., correlation analyses)?

scRNA-seq pseudobulk • 639 views

ADD COMMENT • link updated 5 months ago by Ming Tommy Tang ★ 4.5k • written 5 months ago by adele • 0

0

Entering edit mode

I used AggregateExpression in my work.

ADD REPLY • link 5 months ago by Ming Tommy Tang ★ 4.5k

score 0 · Answer 1 · 2024-07-25

This all depends on a bit more detail around your analysis approach, and what you value in deriving correlation. Indeed the AggregateExpression function in Seurat allows you to sum counts across a prescribed variable, and AverageExpression returns the average, with consideration for the unit used, from the docs: "If slot is set to 'data', this function assumes that the data has been log normalized and therefore feature values are exponentiated prior to averaging so that averaging is done in non-log space."

If your overall objective is to look at co-regulation in single cell data, then I'd suggest reading around the hdWGCNA methods, as they've given consideration to some of the challenges in this space. Namely, lack of biological replication and how to approximate that with balanced random bootstrapping within neighborhoods.