Here is the corrected and polished version of your updated question:
I have combined a series of datasets containing a specific cell type. These datasets come from different (but related) tumors, studies, and timepoints. I want to perform a differential expression analysis on these cells based on whether the patient survived or not. The problem is that, during an initial exploration with PCA, I observed that the cells cluster differently depending on some of these variables. Should I regress out these sources of variation (using the SCTransform option vars.to.regress) before performing the differential expression analysis with DESeq2? Or should I perform the differential expression analysis on the raw counts, even though they show these differences?
Another question I have is whether I can use the vars.to.regress option freely with as many variables as I want, or if doing so comes at a cost to accuracy, interpretability, etc.
From what I have been told, the use of FindMarkers overestimates p-values as it treats data coming from the same origin as different samples. After investigating, I found out that you can use the design parameter of DESeq2 so that it takes away the variance that results from other variables by specifying something like:
design = ~ variable1 + variable2 + variable3 + variable_of_interest
. This way, I can use the normalized raw counts without worrying about variables 1, 2, and 3 masking or altering the effects of the variable of interest. Is this true?