I am facing a comparable situation so that I will share some of my reasoning. I will try to structure this answer properly so that it will be easier to read. Apologies if it might seem confusing.
The major problem is the discrepancy in expectation (many replicates <-> high effort and investment <-> high power to detect SDE). Also, the current way transcriptomics studies using DGE are published does not reward the case where there are no or few SDE's. In general, scientific publishing rewards rejection of the null hypothesis. So what if there is nothing to see here?
Checklist
Are the data sane?
Am I using state-of-the-art tools?
Is there enough statistical power?
Are there other confounding factors?
Are the groups artificial?
Are there other ways to look at the data?
Are the data sane?
Rigorous QC should happen, excluding contaminated or low-quality data. Then, an exploratory analysis should be performed, such an MDS-plot, Correspondence analysis, and PCA. Does the MDS/CA-plot separate the groups properly, and are the potential outliers or mislabelled samples that should be removed?
Am I using the right tools?
According to Schurch et al. 2016, you are using the best tools for standard DGE for controlling the FDR below the selected cut-off. Also, all statistics seem to become more sensitive, the more replicates they get. However, if you are using single-cell sequencing or something else, then other methods might be more appropriate.
One concern remains: the tools could be too conservative with large sample sizes. In the comparison, for each software, a tool-specific gold-standard was generated based on the full set of replicates; also, the variation using clonal yeast cultures might be artificially low in comparison to what you are facing. That might mean that the total number of SDE's might differ quite a lot between tools. Most of the time, replicate numbers are still low (<12), and therefore packages might be optimized to those small numbers; at least that is why unique methods exist in the first place.
You have enough replicates, though, for a Mann-Whitney or simple T-test. It seems unlikely, but it would not hurt to try.
Is there enough power?
That is hard to tell, but if you are using personal patient data, the individual variation might be very high. If your data is tumor progression or something similar, the differences between stages could be subtle, and they will, besides, be gradual and possibly paired. Maybe check different contrasts, like T1 vs. T4. It is not clear which experimental factors you compared by the way.
Are there other confounding factors?
Check for batch effects or convolution. Are there differences between male and female samples, age, treatments, etc.? Tumor samples are often complex mixtures of cells. Could De-convolution be applied? Could the samples be stratified appropriately?
Are the groups artificial?
Groupings could be somewhat subjective (e.g., classification of disorders, staging of disease progression) with fuzzy boundaries adding to the variation within groups. Are there other, more quantitative response variable that could be investigated (survival, relapse, ...)?
Are there other ways to look at data
Possibly, assigning hard categories is not always the most productive. Could the data be instead used as a predictor for the outcome? Could feature selection be applied to detect marker genes of stages, progression, etc.? Last resort: network analysis :)
Hey! I noticed that your replicates are so many so are you working with single cell RNA seq or bulk-cell RNA seq? for scRNA-seq, you have to do more than simple normalization of your counts before doing DEGs.
Also, you mentioned "all of my adj-pvalues were surprisingly the same", which I believe is not reasonable. In this case, it's better to re-check your inputs and better to attach examples to let us help you check.
Dear statfa, Hi
I did not understand clearly, have you received this "strange results" in DEG analysis of exactly the same samples/data? because if the samples are different, may be there is no DEG in your new samples.
Hi,
Thanks for your reply
In my previous studies, my results seemed ok. In my new study i have 4 conditions (or treatment group) with 36,71,97,67 replicates in each condition. I'm getting strange results. Deseq2 gives no DEGs, edger gives 100 genes and voom gives only 2 genes as differentially expressed.
I find my results in this new study strange because in my previous studies these models all worked similarly and gave me thousands of DEGs (for example 8000 degs, 9000 degs, and so on.)
What sort of experimental model is this? Differentiation, a knock-out, etc? In general 8000 DE genes is over the top. The number of DE genes you get back will correlate with the effect size of the group differences, so if the alteration is very subtle you may get no DE genes.
Thanks for your reply. The study is about four stages of a disease and I'm gonna conduct differential analysis to find DEGs in these stages using the table of read counts. I have 36, 71, 97 and 67 replicates in each condition and 54000 genes. I filtered out lowly expressed genes and there remaind 25000 genes to analyze. I further conducted DESeq2 and all of my adj-pvalues were surprisingly the same. I changed the cook's distance cut off but few adj-pvalues changed. DESeq2 didn't recognize any DE genes.
I then conducted edgeR and was given only about 100 DE genes with adj-pvalues.
Voom gave me only 2 differentially expressed genes.
A friend conducted DESeq (not DESeq2) by default settings and got 1750 differentially expressed genes. Why are the results too different? And why am I given no or few genes using DESeq2, edgeR and Voom?
There is no reason to use DESeq, you should ignore everything your friend is telling you as he/she isn't competent in RNAseq analysis.
You have so few DE genes because either the effect size is smaller than you're expecting or because the variance is much larger. It's possible that you have some sample swaps, so have a look at how the samples are clustering.
It seems a bit churlish to suggest some has no idea what they are doing simply because they are using an outdated tool.