Simple question, sorry if it is obvious, but I was unable to find an answer to this exact question.
I have a single cell RNAseq dataset and I'm performing differential gene expression at the cluster level, comparing transcript expression based on experimentally defined treatments (+/- IFNg). One gene in particular, CIITA, is literally absent from the dataset in the control condition, but detectable in a sizeable subset of IFNg-treated cells, in all clusters.
Differential expression using both Seurat and DESeq2 give me p values and and log2Fold Changes, which is significant for some clusters, and insignificant for others, despite obvious upregulation of CIITA in a subset of cells in each IFNg-treated cluster (ranging from 5-20% non-zero expression per cluster).
My question is - when one group starts at zero, and the comparison group is non-zero, are these statistical tests valid? Clearly the log2 fold change is meaningless, as 0 expression to anything should be infinite. I'm guessing the fact that a number can even be returned for LogFoldChange reflects the offsetting of counts by a small value to eliminate Log(0) errors. But this offset probably also influenced the p value.
Any thoughts on how I can proceed? Are there any packages/methods that address this issue? Is it even meaningful to compare genes where expression is completely absent from the control group?
Many thanks
The MAplot may help you identify such lowly expressed genes with big log2Fold changes.
see also https://support.bioconductor.org/p/108491/