As I've understood Seurat properly, in the initial steps, it performs scaling and centering of the data so that your data resembles a normal distribution using the ScaleData()
function. If this is the case, then why does the default test for differential expression have the Wilcoxon test when it is a non-parametric test? would it not be better off to use DESeq2 instead and trust the results from DESeq2?
Please do correct me if i'm going wrong in my understanding anywhere here.
Thank you. Suvi
If you interpret the hundreds of cells you have per sample as replicates, there shouldn't be much need for the sophisticated modelling that DESeq2 does to overcome the typical limitations of bulk RNA-seq data (namely: lack of replicates). A t-test (or, alternatively, Wilcoxon test) usually works fine if you have hundreds of replicates per gene. That being said, DESeq would use the raw read counts, too, not the scaled data.
so which results should i report? the one from Wilcoxon test or from DESeq2? Also the number of differentially expressed genes I get from DESeq2 is way more than the number of genes I get from the Wilcoxon test, so I don't know which ones to trust. i.e. - the genes I'm looking into is gets detected only when I use DESeq2 and not Wilcoxon test. So, I don't know what to do.
in silico there isn't that much you can do at a single-gene level, but if you're interested in just a single gene, I would strongly recommend to look at the expression pattern of your gene of interest in the groups you're comparing. Get an idea of why that gene seems to be borderline DE as it is being missed by one method. That means, looking at both raw counts as well as normalized data may be helpful.
The only way to know whether your gene is biologically important for whatever conditions you're looking at is to set up an appropriate experiment.
I see, thank you for the explanation. My question still remains - which one do i pick for differential expression testing? Wilcoxon test or DESeq2?
My argument would be that it does not matter. It is more important to understand why the tests disagree for your specific gene IMO.
The only way to know that would be by doing what you suggested in your previous comments?
I'm sure there are more ways, but that's how I would start going about it, yes.