Dear all,
I am currently using DESeq2 to identify differentially expressed genes in an RNA-seq dataset comparing two conditions: 7 stressed rats and 20 control rats. After running the analysis, I noticed a highly skewed distribution of differentially expressed (DE) genes. Specifically, 514 genes are down-regulated (log2 Fold Change < 0), while 1152 genes are up-regulated (log2 Fold Change > 0). This uneven distribution has raised concerns about the reliability of my results.
To investigate further, I made sure the data were properly normalized and examined the PCA plots, which did not reveal any obvious outliers. I then attempted to address the imbalance between the groups by randomly sampling 7 control rats to match the number of stressed rats. I repeated this random sampling four times and compared the overlap of DE genes across the subsets. However, the results were highly inconsistent, with only around 10% overlap in DE genes across the different subsamples.
This variation in results has left me uncertain about how to proceed. Should I discard the approach of subsampling and consider the results obtained using all 20 controls as more reliable? Or are there alternative strategies I should explore to address this issue?
I would appreciate any advice or if someone could point me to a relevant discussion if this topic has already been covered.
Thank you in advance for your help!
Best regards,
MG
Please show
plotMA()
. Anyway, a 2:1 ratio is not really concerning, there is no guarantee DE profiles are symmetrical.What is the specific concern here, and who is raising it? Is the idea that each up-regulated gene should have a corresponding down-regulated gene? Why should that be?