Why is there a difference between the report outputs I get after dba.report, with a threshold set to P-value = 0.05 and fold=0.13750352375 [to get (abs)logFC=1.1], compared to when I export the data without any threshold, getting all differentially bound sites and applying the same filter on excel?
I used:
DBA$config$bUsePval = TRUE
DBA$config$th = 0.05
and
dba.report(DBA, fold=0.13750352375, bNormalized=TRUE)
with this I got 430 significant sites.
--
Whereas, with this setting:
DBA$config$th = 1
and
dba.report(DBA, bNormalized=TRUE)
I got all the sites, but exporting this report to excel and then applying the same filter to keep sites differentially bound sites to be equal to or below P-val 0.05 and beyond (abs) fold change 1.1, the significant sites are 2135.
Yes, this is indeed what is happening. The null hypothesis is different when you set the
fold
parameter.For a
DESeq2
analysis, thefold
value is sent using thelfcThreshold
parameter toDESeq2::results
.DESeq2::lfcShrink
is used to compute fold changes independently of p-value testing (but will respect thefold
setting).For an
edgeR
analysis, iffold
is not zero,edgeR::glmTreat
is called with thelfc
parameter set tofold
; whenfold
is zero,edgeR::glmQLFTest
is used for testing.Apologies since I am still fairly new to DiffBind, so what I understand is that I am getting correct results in both cases (I am using edgeR for my analyses) just that the threshold parameters work differently in DiffBind to how I would want to set them in an excel sheet? Which one should I trust?
DiffBind
is doing it the wayedgeR
recommends when you want to filter for fold change magnitudes greater than zero.One way to think about this in your case is that there are more sites with high confidence that their differences are greater than zero, and fewer sites with high confidence that their difference is greater than 1.1x.
Hi Rory, I hope you're doing well, apologies for reactivating this thread again. I am comparing two groups of ChIP data using DiffBind and applying a threshold parameter of fold=0.1375 and pval=0.05. When I output the results to a csv file, I get a certain number of significantly changed peaks, let's say 1400.
However, when I remove the parameters from the output csv file and apply the threshold manually in Excel, I get a different number of peaks. The number of significant peaks increases, with hundreds of additional peaks that were not deemed significant by DiffBind. For example, the new total number of significant peaks becomes ~1800 instead of the initial 1400.
As ATpoint, pointed out that the "testing is asking whether the logFC is different than 1.1 rather than zero", would this be resolved if I just used a higher fold change threshold parameter?
Thank you in advance!
I think this issue is covered in this thread already. When you specify a
fold
inDiffBind
, this changes the null hypothesis and hence the calculation of all the p-values (and FDR values). If you do not specify thefold
, the null hypothesis is based onfold=0
. so if you output this to a spreadsheet and apply your own thresholds you will get different results. Specifically, you would expect to get more sites identified as being significantly different.Current best practice suggest that is you are going to apply a
fold
threshold, you should use the adjusted confidence statistics and not just apply a fold and pval threshold to the baseline data.