Question

Do I need to regressout uninteresting sources of variation bebofe pseudobul DESeq2 differential expression analysis in single cell data

0

Entering edit mode

6 months ago

alvaroperona1 • 0

Here is the corrected and polished version of your updated question:

I have combined a series of datasets containing a specific cell type. These datasets come from different (but related) tumors, studies, and timepoints. I want to perform a differential expression analysis on these cells based on whether the patient survived or not. The problem is that, during an initial exploration with PCA, I observed that the cells cluster differently depending on some of these variables. Should I regress out these sources of variation (using the SCTransform option vars.to.regress) before performing the differential expression analysis with DESeq2? Or should I perform the differential expression analysis on the raw counts, even though they show these differences?

Another question I have is whether I can use the vars.to.regress option freely with as many variables as I want, or if doing so comes at a cost to accuracy, interpretability, etc.

scRNA-seq regression DESeq2 pseudobulk • 711 views

ADD COMMENT • link updated 6 months ago by Ram 45k • written 6 months ago by alvaroperona1 • 0

GenoMax · Answer 1 · 2024-11-16

1

Entering edit mode

6 months ago

fracarb8 ★ 1.7k

Two things here

DESeq2, like negbinom an poisson, requires counts, so values not scaled.

vars.to.regress affects pca, but not DE testing. If you want to regres out some confounders, you should use latent.vars (see FindMarkers docu).

ADD COMMENT • link 6 months ago by fracarb8 ★ 1.7k

0

Entering edit mode

From what I have been told, the use of FindMarkers overestimates p-values as it treats data coming from the same origin as different samples. After investigating, I found out that you can use the design parameter of DESeq2 so that it takes away the variance that results from other variables by specifying something like: design = ~ variable1 + variable2 + variable3 + variable_of_interest. This way, I can use the normalized raw counts without worrying about variables 1, 2, and 3 masking or altering the effects of the variable of interest. Is this true?

ADD REPLY • link updated 6 months ago by GenoMax 151k • written 6 months ago by alvaroperona1 • 0