Question

Am I accounting for replicates correctly in pseudobulk analysis?

0

Entering edit mode

2 days ago

Aspire ▴ 370

I am analyzing a single-cell RNA-seq dataset with:

4 patients
2 conditions per patient
2 cell types of interest per patient-condition combination

I want to perform pseudobulk differential expression analysis to examine:

Condition effects (differences due to treatment/condition).
Cell type effects (differences between the two cell types).

Approach in question

I plan to create 16 pseudobulk samples (4 patients × 2 conditions × 2 cell types) and run DESeq2 with these models:

Condition effects ~ Patient_CellType_combination + Condition
Cell type effects ~ Patient_Condition_combination + CellType

Each formula accounts for patient-specific effects by grouping the data accordingly:

There are 8 Patient_CellType_combinations (4 patients × 2 cell types).
There are 8 Patient_Condition_combinations (4 patients × 2 conditions).

Concerns

While this approach maximizes the use of single-cell data, I worry that pseudobulk samples within each patient-treatment-cell type combination are not true biological replicates. This could lead to inflated false positives because samples from the same patient and treatment may be more similar than expected in traditional bulk RNA-seq.
For example, in the first comparison (across conditions), Patient1-Cluster1-Condition1 may be more similar to Patient1-Cluster2-Condition1 than independent biological samples would be.

Question

Is this the correct approach for pseudobulk analysis?

deseq2 single-cell pseudobulk • 215 views

ADD COMMENT • link updated 1 day ago by ATpoint 87k • written 2 days ago by Aspire ▴ 370

0

Entering edit mode

You have paired data since it's all within each patient so make good use of that, it's statistically powerful. I would aggregate indeed by patient-celltype-condition and for the design I would make a new group Celltype_Condition and then ~Celltype_Condition + Patient. You can then contrast via Celltype_Condition as needed, both for between-celltype, or between-condition effects. Since a paired analysis you do not need to worry that the cells within patient are more correlated than between patient.

ADD REPLY • link 1 day ago by ATpoint 87k

0

Entering edit mode

If I use Celltype_Condition as a contrast, there would be four levels : CelltypeA_Control, CelltypeA_Treated, CelltypeB_Control, CelltypeB_Treated.

However, there is no comparison within these that enables one to see the overall effect of Celltype (while accounting for treatment), nor the overall effect of treatment (across both Celltypes).

Instead, all comparisons would be only between specific Celltype_Condition combinations.

ADD REPLY • link 1 day ago by Aspire ▴ 370

0

Entering edit mode

I would test the celltype and treatment effect here via the averages, for example for the celltype effect as below. I think that's one way of doing this others might correct me if wrong:

library(DESeq2)

set.seed(1)
dds <- makeExampleDESeqDataSet(m = 8)
dds$celltype <- factor(c("A", "A", "A", "A", "B", "B", "B", "B"))
dds$treatment <- factor(rep(c("Control", "Treated"), 4))
dds$patient <- factor(c(1,1,2,2,1,1,2,2))

# combine celltype and treatment
dds$ct <- factor(paste(dds$celltype, dds$treatment, sep = "_"))

design(dds) <- ~0 + ct + patient
dds <- DESeq(dds)
res <- results(dds, contrast = list(c("ctA_Treated", "ctA_Control"), c("ctB_Treated", "ctB_Control")), listValues = c(.5, -.5))

ADD REPLY • link 1 day ago by ATpoint 87k