Question

DESeq2 analysis with high number of samples

0

Entering edit mode

4.9 years ago

Kumar ▴ 170

Hi all,

I am performing RNA Seq data analysis in order to identify differential gene expression analysis on a large number of samples (~200). Initially, I ran STAR with 40 samples (20 affected and 20 unaffected). I got a featureCount matrix. Now, I am performing DESeq2, but once I do the DEseq2 analysis with any of 6 samples (3 affected and 3 unaffected), it shows the some genes up and down regulation but once I increased the number of samples like 8 or 10 for DESeq2, it is not able to show the differential expression.

Please see the PCAplots. PCA plot2 which shows only 6 datasets in total (3 affected, 3 unaffected), datasets are separated better (still not the best) where I find Up and Down genes with small numbers of samples but there is no expression with 40 samples. Please see below statistical analysis and PCAs.

How we can do DGE if we have more samples.

Thanks in advance

DESeq2 analysis with 6 samples:

> summary (res)
out of 20707 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 27, 0.13%
LFC < 0 (down)     : 59, 0.28%
outliers [1]       : 0, 0%
low counts [2]     : 17677, 85%

DESeq2 analysis with 40 samples:

> summary (res)
out of 42196 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 0, 0%
LFC < 0 (down)     : 0, 0%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%

RNA-Seq DESEQ2 next-gen • 1.8k views

ADD COMMENT • link updated 4.9 years ago by Ram 45k • written 4.9 years ago by Kumar ▴ 170

0

Entering edit mode

You will never know until you add all 200 samples, I would suggest doing that instead of dipping your toe in with small sample sizes. Judging by your 1st PCA plot, it is not surprising there are no DE genes, however maybe the other 160 samples will separate better on the PCA plot (:

Have a read of the thread below, they suggest some troubleshooting. No differentially expressed genes using DESeq2

ADD REPLY • link 4.9 years ago by Barry Digby ★ 1.3k

0

Entering edit mode

I think it is understandable not to be able to show differential gene expression with a high number of samples since these are randomly picked (mix of samples) from the patients and normal. Yes! I would run all 200 samples and see.

ADD REPLY • link 4.9 years ago by Kumar ▴ 170

0

Entering edit mode

You are posting this question prematurely, it seems. Please proceed with your experiment and then return if there are still 'issues'. By the way, just looking at your first PCA bi-plot, I would regard the control sample on the right as an outlier, and remove it. However, it may not appear as an outlier once you have profiled more samples.

ADD REPLY • link 4.9 years ago by Kevin Blighe 89k