I am bit confused the use of RNA vs SCT assays for DGE analysis, and wondering if anybody who uses Seurat to shed a light. I've been preforming a Seurat3 integration method with SCTranform by simply following their vignette. According to some discussion and the vignette, a Seurat team indicated that the RNA assay (rather than integrated or Set assays) should be used for DotPlot and FindMarkers functions, for comparing and exploring gene expression differences across cell types. But the RNA assay has raw count data while the SCT assay has scaled and normalized data. It seems to me that numbers in the SCT assay are more appropriate for comparing DGE among cell types. Am I missing something ?
You can also normalize and scale data for the RNA assay. There are numerous resources on this, but Aaron Lun describes why the original log-normalized values should be used for DE and visualizations of expression quite well here:
For gene-based procedures like differential expression (DE) analyses
or gene network construction, it is desirable to use the original
log-expression values or counts. The corrected values are only used to
obtain cell-level results such as clusters or trajectories. Batch
effects are handled explicitly using blocking terms or via a
meta-analysis across batches. We do not use the corrected values
directly in gene-based analyses, for various reasons:
It is usually inappropriate to perform DE analyses on batch-corrected
values, due to the failure to model the uncertainty of the correction.
This usually results in loss of type I error control, i.e., more false
positives than expected.
The correction does not preserve the
mean-variance relationship. Applications of common DE methods like
edgeR or limma are unlikely to be valid.
Batch correction may
(correctly) remove biological differences between batches in the
course of mapping all cells onto a common coordinate system. Returning
to the uncorrected expression values provides an opportunity for
detecting such differences if they are of interest. Conversely, if the
batch correction made a mistake, the use of the uncorrected expression
values provides an important sanity check.
In addition, the normalized values in SCT and integrated assays don't necessary correspond to per-gene expression values anyway, rather containing residuals (in the case of the scale.data slot for each).