Hello, I am new to bioinformatics, and I am trying to do DEG analysis using DESeq2. I followed the DESeq2 vignette, and I did this
res <- results(dds, lfcThreshold = log2(2), alpha = 0.05)
summary(res)
out of 17813 with nonzero total read count.
adjusted p-value < 0.05
LFC > 1.00 (up) : 78, 0.44%
LFC < -1.00 (down) : 106, 0.6%
outliers [1] : 0, 0%
low counts [2] : 2073, 12%
(mean count < 7)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
So, my question is, how do I get my up-regulated and down-regulated genes from this output? I am setting my log2fc=1 for the cutoff threshold. I also read this thread, but I still could not find the answer I am looking for. Thank you very much.
Thank you Barry. With the above code I got 146 genes for up-regulated, and 190 genes for down-regulated. I am concerned because they are less than expected. Then I also tried these codes from a training website- DESeq2 workshop
padj.cutoff <- 0.05. lfc.cutoff <- 1
With the above code, I got 590 genes for up and down-regulated genes.
What is wrong here? Which one should I take? why?
Thanks.
Hi Aynur,
The code I gave you filters by
pvalue
, notpadj
, it will return more genes.The code example you provided did not filter by
log2FoldChange
when I tested it, I am not overly familiar with dplyr though..If you want to replicate the results of
summary(res)
then pay attention to the output and replicate it step by step:It says it used LFC > 0 and padj < 0.05.
We need to change the
LFC > 0
part to something biologically plausible, i.e +/- 1. This should explain why you 'got less genes than expected'.Thank you very much for your clarification. For the significantly up-regulated and down-regulated gene list, I also got N/As for padj, do I also need to filter them out for my final list or is it irrelevant? I want to validate those genes using RT-PCR, so I need a significantly differentially expressed gene list based on my log fold change cutoff.
Regards,
You specifically want to evaluate genes with
NAs
in the wet lab? Don't.Try this filtering method, it should help prevent NA's and spurious (very high/negative Log2FC) calls showing up in your up/down regulated results:
You can now subset the
LFC
dataframe like in the previous comments.IHW: Instead of finding a threshold on the mean of scaled counts that optimizes the number of rejected hypotheses following Benjamini-Hochberg correction, the IHW package finds an optimal weighting of the hypotheses that maximizes power while still controlling the FDR. - source
apeglm: in a nutshell, preserves 'true' LFC differences and removes genes that could have presented seemingly valid Log2FCs due to noise.