I am following Seurat vignette: immune_alignment with my own treated vs control data sets.
The vignette offers two methods for identifying differential expressed genes across conditions. One is based on comparing averaged expression
between conditions and the other is based on FindMarkers
which eventually computes fold-change between conditions. The vignette shows many genes common to both methods (per one specific cluster).
In my data sets, the resulting two sets of DE genes are quite different. What could be the reason for this? More generally, what is the core difference between methods?
In my case, for the cluster under examination, the number of cells mapped to control is larger by a factor of 7 compared with those mapped to treatment. Could this be a related factor?
Technically, they are not comparing DE results obtained with the
averaged expression
approach to those obtained with theFindMarkers
approach. The former is performed forCD4 Naive T cells
andCD14 Monocytes
forstim
vsctrl
comparison and the latter is used forB cells
, again forstim
vsctrl
. Nevertheless, all three DE lists, at a first glance, kind of overlap and is speculated to result from "a conserved interferon response pathway". Actually, theaveraged expression
approach is not a DE analysis attempt per se, as the authors of the vignette state, but is just "a way to look broadly at these changes" and does not provide a list of DE genes but just a scatter plot, which can be used to infer DE-related information. TheFindMarkers()
can be run using one of the 9 statistical tests, some of which take metrics like variance/dispersion and not just average expression, and therefore should be the way to go in terms of deciding the DE genes between cell clusters/types/states of interest.