How to aggregate pseudobulks: Normalization & Log-Transformation
1
0
Entering edit mode
19 months ago
Tadeoye ▴ 30

I am currently working on a single-cell data analysis project, and I am facing a challenge regarding the aggregation of single-cell data into pseudobulks for input into the GSVA software. GSVA only accepts a gene X subject matrix, which means that pseudobulks must be created to facilitate this input. I have come across two different approaches to the aggregation process and I am unsure of which one to use.

In a recent paper by Blanchard et al., pseudobulk counts were aggregated after normalizing and log-transforming the data. The authors computed normalized gene expression profile averages first, using ACTIONet, and then obtained individual-cell-type-level aggregated expression profiles. On the other hand, a single-cell tutorial suggests aggregating raw counts first, followed by normalization and log transformation. This step is important because the gaussian kernel I intend to use in GSVA software only accepts continuous expression data in logarithmic scale and RNA-seq log-CPMs, log-RPKMs, or log-TPMs units of expression.

I am unsure which approach to take. Should I normalize and log-transform the data first before aggregation, or should I aggregate first before normalization? I would greatly appreciate any guidance or insights on this matter.

scRNA-seq pseudobulk GSVA pseudoreplicate • 1.8k views
ADD COMMENT
0
Entering edit mode
19 months ago
ATpoint 85k

The approach I know and use is to sum the raw counts per cluster/group or whatever you want to aggregate over and then treat it as a normal bulk sample. The scuttle package has a function scuttle::aggregateAcrossCells() to automate the aggregation. Downstream normalization could then be done with edgeR or DESeq2.

ADD COMMENT
0
Entering edit mode

Thanks!

My pipelines are designed to run in python, so would this be similar to running decoupler.get_pseudobulk() and then running scanpy.pp.normalize_total() and scanpy.pp.log1p() in python using the decoupler and scanpy packages?

ADD REPLY

Login before adding your answer.

Traffic: 2203 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6