Question

DiffBind analysis report gives me two different outputs depending on when I apply a filtering threshold [eg: P-value=0.05, (abs)FC=1.1]

0

Entering edit mode

2.6 years ago

prachejp • 0

Why is there a difference between the report outputs I get after dba.report, with a threshold set to P-value = 0.05 and fold=0.13750352375 [to get (abs)logFC=1.1], compared to when I export the data without any threshold, getting all differentially bound sites and applying the same filter on excel?

I used:

DBA$config$bUsePval = TRUE

DBA$config$th = 0.05

and

dba.report(DBA, fold=0.13750352375, bNormalized=TRUE)

with this I got 430 significant sites.

--

Whereas, with this setting:

DBA$config$th = 1

and

dba.report(DBA, bNormalized=TRUE)

I got all the sites, but exporting this report to excel and then applying the same filter to keep sites differentially bound sites to be equal to or below P-val 0.05 and beyond (abs) fold change 1.1, the significant sites are 2135.

DiffBind • 1.6k views

ADD COMMENT • link updated 21 months ago by Rory Stark ★ 2.1k • written 2.6 years ago by prachejp • 0

score 2 · Answer 1 · 2022-05-04

2

Entering edit mode

2.6 years ago

ATpoint 85k

My guess is that specifying a testing threshold triggers to use the lfc argument in lfcShrink of DESeq2 (which is used internally, or is it edgeR?In any case, same principle) which then sets the Null hypothesis to 1.1 rather than the default 0. Meaning the testing is asking whether the logFC is different than 1.1 rather than zero, and this is more stringent at same FDR cutoff as testing against 0 and then do the post-filtering as in your 2nd example.

ADD COMMENT • link 2.6 years ago by ATpoint 85k

2

Entering edit mode

Yes, this is indeed what is happening. The null hypothesis is different when you set the fold parameter.

For a DESeq2 analysis, the fold value is sent using the lfcThreshold parameter to DESeq2::results. DESeq2::lfcShrink is used to compute fold changes independently of p-value testing (but will respect the fold setting).

For an edgeR analysis, if fold is not zero, edgeR::glmTreat is called with the lfc parameter set to fold; when fold is zero, edgeR::glmQLFTest is used for testing.

ADD REPLY • link 2.6 years ago by Rory Stark ★ 2.1k

0

Entering edit mode

Apologies since I am still fairly new to DiffBind, so what I understand is that I am getting correct results in both cases (I am using edgeR for my analyses) just that the threshold parameters work differently in DiffBind to how I would want to set them in an excel sheet? Which one should I trust?

ADD REPLY • link 2.6 years ago by prachejp • 0

1

Entering edit mode

DiffBind is doing it the way edgeR recommends when you want to filter for fold change magnitudes greater than zero.

One way to think about this in your case is that there are more sites with high confidence that their differences are greater than zero, and fewer sites with high confidence that their difference is greater than 1.1x.

ADD REPLY • link 2.5 years ago by Rory Stark ★ 2.1k

0

Entering edit mode

Hi Rory, I hope you're doing well, apologies for reactivating this thread again. I am comparing two groups of ChIP data using DiffBind and applying a threshold parameter of fold=0.1375 and pval=0.05. When I output the results to a csv file, I get a certain number of significantly changed peaks, let's say 1400.

However, when I remove the parameters from the output csv file and apply the threshold manually in Excel, I get a different number of peaks. The number of significant peaks increases, with hundreds of additional peaks that were not deemed significant by DiffBind. For example, the new total number of significant peaks becomes ~1800 instead of the initial 1400.

As ATpoint, pointed out that the "testing is asking whether the logFC is different than 1.1 rather than zero", would this be resolved if I just used a higher fold change threshold parameter?

Thank you in advance!

ADD REPLY • link 21 months ago by prachejp • 0

0

Entering edit mode

I think this issue is covered in this thread already. When you specify a fold in DiffBind, this changes the null hypothesis and hence the calculation of all the p-values (and FDR values). If you do not specify the fold, the null hypothesis is based on fold=0. so if you output this to a spreadsheet and apply your own thresholds you will get different results. Specifically, you would expect to get more sites identified as being significantly different.

Current best practice suggest that is you are going to apply a fold threshold, you should use the adjusted confidence statistics and not just apply a fold and pval threshold to the baseline data.

ADD REPLY • link 21 months ago by Rory Stark ★ 2.1k