I am doing gene expression analysis using a set of 228 patients (52 positive and 176 negatives). I found a list of 17 DEGs that were significant (|log fold change| > 1 and p-value <0.1). I later found out that I can't include 30 patients from my negative class because they don't meet certain criteria. So, I removed them and redid the analysis. Now I get fewer DEGs than before (12 in number) and all 12 overlaps with the initially found 17 DEGs. The remaining five genes that were not selected as DEGs had a log fold change very close to 1.
I don't understand a couple of things:
- What is the relation between sample size and log fold change? Why does it change when I remove samples?
- Is there any statistical test I can do to show that the 5 genes that I am not getting as DEGs in the second analysis are just due to a statistical anomaly and nothing else?