Hi all,
I am performing RNA Seq data analysis in order to identify differential gene expression analysis on a large number of samples (~200). Initially, I ran STAR with 40 samples (20 affected and 20 unaffected). I got a featureCount matrix. Now, I am performing DESeq2, but once I do the DEseq2 analysis with any of 6 samples (3 affected and 3 unaffected), it shows the some genes up and down regulation but once I increased the number of samples like 8 or 10 for DESeq2, it is not able to show the differential expression.
Please see the PCAplots. PCA plot2 which shows only 6 datasets in total (3 affected, 3 unaffected), datasets are separated better (still not the best) where I find Up and Down genes with small numbers of samples but there is no expression with 40 samples. Please see below statistical analysis and PCAs.
How we can do DGE if we have more samples.
Thanks in advance
DESeq2 analysis with 6 samples:
> summary (res)
out of 20707 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 27, 0.13%
LFC < 0 (down) : 59, 0.28%
outliers [1] : 0, 0%
low counts [2] : 17677, 85%
DESeq2 analysis with 40 samples:
> summary (res)
out of 42196 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 0, 0%
LFC < 0 (down) : 0, 0%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
You will never know until you add all 200 samples, I would suggest doing that instead of dipping your toe in with small sample sizes. Judging by your 1st PCA plot, it is not surprising there are no DE genes, however maybe the other 160 samples will separate better on the PCA plot (:
Have a read of the thread below, they suggest some troubleshooting. No differentially expressed genes using DESeq2
I think it is understandable not to be able to show differential gene expression with a high number of samples since these are randomly picked (mix of samples) from the patients and normal. Yes! I would run all 200 samples and see.
You are posting this question prematurely, it seems. Please proceed with your experiment and then return if there are still 'issues'. By the way, just looking at your first PCA bi-plot, I would regard the control sample on the right as an outlier, and remove it. However, it may not appear as an outlier once you have profiled more samples.