Question

DiffBind reports no significant differentially enriched peaks

1

Entering edit mode

2.5 years ago

Marco Pannone ▴ 810

Hello

I have been using DiffBind to perform differential enrichment analysis on my ChIP-seq dataset where I have 2 sample groups, WT and KO, with 4 replicates in each sample group.

The analysis reports only 2 significantly differential enriched peaks at FDR < 0.05, and the situation does not change even if, hypothetically, I choose as threshold FDR < 0.99. Basically, all the remaining peaks in the consensus peakset report FDR=1.

I used a consensus peakset generated by myself, basically merging all peaks called (by MACS2, FDR<0.05) for WT and KO against input. This way, I could create a peakset including all significant peaks among all samples. However, peak calling has been performed on merged samples, where all WTs have been merged, and the same for the KO samples.

When I performed some exploratory analysis, such as PCA, I noticed some variation between replicates of each sample group, but nothing too dramatic.

I am still unsure if my decision to feed into the analysis my own consensus peakset, generated as explained above, is correct.

In the Genome Browser, I can clearly notice how some loci show different enrichment for this mark, but I am looking to find a quick way to identify loci that significantly differ in enrichment between WT and KO. Also in the peak calling results, from merged samples, there is a clear difference in the number of called peaks between the two sample groups. But anyway, in DiffBind I do not really get any significant differentially enriched peaks since almost all of them have FDR=1.

Any comment or suggestion would be highly appreciated, either about my approach with DiffBind or recommending other tools.

Thanks in advance!

ChIP-seq DiffBind • 3.3k views

ADD COMMENT • link updated 19 months ago by hiroyuki.kato.911 • 0 • written 2.5 years ago by Marco Pannone ▴ 810

0

Entering edit mode

Did you find the reason? I meet a similar problem and I got 0 significantly differential enriched peaks at FDR < 0.05, nothing change even with a threshold FDR<0.5. And my result of plotMA() is just a flat line with no point upper or lowwer than log2fc0.

ADD REPLY • link 2.2 years ago by Ri ▴ 30

0

Entering edit mode

Ri and Marco, did you solve the problem? If yes, How?

I am facing the sample issue excatly the same as Ri described. In addition the p-values looks reasonable (a range from <0.05 to >0.05), however, they all have FDR=1.

I am using DiffBind 3.10.0.

Thanks!

ADD REPLY • link 23 months ago by chiefcat ▴ 180

0

Entering edit mode

Hmmn, this has now been reported three times, which is concerning.

If someone can send me their DBA object after analysis (or a link to where I can download it) I can have a look at what is happening.

ADD REPLY • link 23 months ago by Rory Stark ★ 2.1k

0

Entering edit mode

Dear Dr. Strack,

Sorry for my late reply! I have just email the DiffBind outputs to you. I will post what we come up here.

Thank you very much for helping! :)

ADD REPLY • link 21 months ago by chiefcat ▴ 180

0

Entering edit mode

Hi, Has this been figured out? I have duplicate CHIPseq data, which have obvious difference in IGV and significant FDR, logFC by usual edgeR or DESEQ analysis using consensus peak sets.

However, using diffbind,

dba.plotMA(DBdata, bNormalized=FALSE)

shows clear difference but

dba.plotMA(DBdata, bNormalized=TRUE)

shows just a flat line at 0 (all the logFC difference seems to be normalized to 0)

I guess this is a normalization problem.

Any solution for this? I'm using DiffBind 3.2.

Thanks,

ADD REPLY • link 19 months ago by hiroyuki.kato.911 • 0

score 1 · Answer 1 · 2023-03-07

Using a consensus peak set derived from all called peaks should be fine (you can do this within DiffBind by setting minOverlap=1).

In the past when a user has reported a completely flat MA plot, the reason turned out to be that they had passed in the same bam file for all the samples (or for all the samples in a sample group, such as a merged bam file use for peak calling). However if you are seeing some variance between replicates in the sample groups, this is probably not what is happening (in the first question).

You can check that you are getting different read counts in a few ways. One way is to retrieve the binding matrix (dba.peakset() with bRetrieve=TRUE) and look at the read counts. You can also compare raw read counts with normalized ones to make sure you aren't over-normalizing and removing the biological signal.

Another way is to retrieve the report with all of the peaks and counts (dba.report() with th=1 and bCounts=TRUE). If you have loci you think should be differentially bound, find them in the report and see if the concentrations and counts are consistent with what you see in the browser.