When I put cutoff of padj or FDR of 0.05 DESeq2 gives 650 DE and edgeR gives 490 DE genes
But when I apply foldchange cutoff (1) also, DESeq2 give 0 DE genes and edgeR gives 352 genes.
Without showing the code you used it is impossible to reproduce if the difference in results is based on the conceptual differences between the tools or your choice of strategy/parameters. Please share representative code and explain also how the lowlevel processing was done (alignment/quantification etc.).
Did you use a foldchange cutoff at the end by filtering or did you generate the result table using the treshold? That should make a difference in DESeq2
For the fold change, both of them uses log2 fold change by default but if you are using DESeq2 for estimating your expression from readcounts then it may differ from edgeR. For the comparison I would first estimate the expression value (most likely in TPM; why TPM is better is explained here https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/), then use this for differential analysis with both program.
DESeq2 does Foldchange-shrinkage. In the manual they say this would be "useful for visualization and ranking of genes".
So basically, if I'm not mistaken, DESeq2 tests per default, whether the expression between to treatments (or other sample groups) is significantly different. If a gene has a certain gene expression, you would expect that in some biological replicate the expression is a little above the average and in another sample the expression is a little less. With thousands of biological replicates you would expect the expression values to be normally distributed around the average expression value.
This information, the average and the dispertion around the average, is needed in order to test, whether the same gene is significantly different expressed in another treatment (or other sample group). Therefore, DESeq2 estimates the dispertion of the theoretical distribution, which you would get with thousands of replicates. However, it can only use 3 expressen values most of the times due to money reasons. If you falsly estimates the dispertion to low, you could get sometimes false positives in the results. Thats why DESeq2 tries to avoid this by comparing the dispersion between genes of similar gene expression levels under the assumption that genes of the same expression show a similar amount of variation from the average expression.
So first DESeq2 estimates the dispertion for every gene. Subsequently, DESeq2 compares the dispertions of similar-strong expressed genes and shrinkes the estimated dispertion of every gene into the direction of the average dispertion of all genes with the same expression level.
You can still use the foldchanges of DESeq2, just be aware, that they are a little changed. But that doesn't mean that they are less "true". The propotions stay the same. I don't know how edgeR does it
edgeR and DESeq2 both apply shrinkage to the logFCs, but the default shrinkage applied by edgeR is (deliberately) light touch whereas the DESeq2 shrinkage is heavier. So logFCs from edgeR will often be larger than those from DESeq2 and the result you see is to be expected.
This is an example of why applying a FC cutoff is a bad idea, especially if you don't know what the estimated logFC values mean. edgeR and DESeq2 are sophisticated programs that carefully assess DE for you. Applying your own very ad hoc rule over the top can't be a good idea!
If you really are worried about the size of the FCs, see the glmTreat function in edgeR, which will do a much better job that you're doing now.
Finally, you should add a filtering step to your edgeR pipeline, e.g., using filterByExpr before estimateDisp.
I know DESeq2 does its own filtering, but would it be a good idea in general to apply edgeR::filterByExpr to the raw counts data before passing them on to DESeq2::DESeq?
Wouldn't it make sense to do it for parity in the event of the counts being filtered elsewhere in the analysis?
Context: my data clusters very poorly on the PCA plots (replicates are all over the place), and I will (probably) use RUVSeq to try and account for this noise. In the RUVSeq vignette the counts data are filtered before being passed to the RUVx functions. So I presumed I should just filter the data that'll be fed to DESeq2 as well. Or can I use the additional variables estimated by RUVSeq using the filtered data on the full data set?
If you have problems with unwanted variation then your problem is probably not solved by filtering some genes as those you'd filter would be lowly- or not expressed at all and therefore cannot be the drivers of your unwanted variation. I cannot comment without context. You can open a new question if you want, but try to follow the manuals first. If RUV recommends to filter to estimate the surrogates then do it.
While i understand that applying ad hoc rules can be a bad idea, some biologists start their exploration at the largest differences between treatments (i.e., large logFC values).
As I understand, DESeq2 does not apply FC shrinkage by default anymore, so what could be driving the pattern similarly described in the OP when this is no longer factor?
Could this be an effect of the stringency of dispersion shrinkage instead?
Without showing the code you used it is impossible to reproduce if the difference in results is based on the conceptual differences between the tools or your choice of strategy/parameters. Please share representative code and explain also how the lowlevel processing was done (alignment/quantification etc.).
count from txt file
for deseq2
for edger
if you want to use an treshold of adj.p of 0.05 in DESeq2 you should already use it in the result function.
if you want to only consider genes which changes in expression by plus or minus 50% you could additionally do this:
thanks for reply.
I was filtering it after exporting to csv file. I am considering 2 fold change, so I was taking lfc 1 and pvalue 0.05
what about edgeR.
Did you use a foldchange cutoff at the end by filtering or did you generate the result table using the treshold? That should make a difference in DESeq2
after getting results I export it and then filter it by padj and logFC
That's different. Use the tresholds in the result function