Pseudobulk for scRNA-seq: sum or mean?
1
0
Entering edit mode
5 months ago
adele • 0

I am often looking at gene expression levels per sample and cell type. I am wondering what is actually the 'most appropriate' method. It is not for DGE analysis, but to visualise a certain gene of interest across conditions and find correlated genes.

I am used to using the Seurat R package for analysing scRNA-seq data. First, there was the function AverageExpression and I use the following formula based on this function to calculate the mean expression of normalised counts:

log1p(mean(expm1(expr)))

However, now there is the function AggregateExpression (sum counts) and based on my understanding this is the suggested method for DGE analysis, but what about other analyses (e.g., correlation analyses)?

scRNA-seq pseudobulk • 639 views
ADD COMMENT
0
Entering edit mode

I used AggregateExpression in my work.

ADD REPLY
0
Entering edit mode
5 months ago

This all depends on a bit more detail around your analysis approach, and what you value in deriving correlation. Indeed the AggregateExpression function in Seurat allows you to sum counts across a prescribed variable, and AverageExpression returns the average, with consideration for the unit used, from the docs: "If slot is set to 'data', this function assumes that the data has been log normalized and therefore feature values are exponentiated prior to averaging so that averaging is done in non-log space."

If your overall objective is to look at co-regulation in single cell data, then I'd suggest reading around the hdWGCNA methods, as they've given consideration to some of the challenges in this space. Namely, lack of biological replication and how to approximate that with balanced random bootstrapping within neighborhoods.

ADD COMMENT

Login before adding your answer.

Traffic: 1380 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6