Question

Trying to understand basic DiffBind usage

1

Entering edit mode

10 weeks ago

alejandrocastillaibeas ▴ 20

Hello Dear community,

I am trying to apply DiffBInd package to perform a binding analysis of two TFs from the same family, TF1 and TF2. My goal is to understand to what extend these two TFs act cooperatively so we have ChIP-seq data of both, 2 replicates of each. I have been reading carefully DiffBind's documentation, some other tutorials and posts in forums like this but still very confused of how DiffBind works.

My question is why the output of the differential binding analysis does not include all consensus peaks? in my analysis there are a couple hundred missing

If I would like to obtain peaks bound by both TFs, I guess these missing peaks from the consensus peakset should be included too. Am I correct?

Is doing a differential analysis and retrieving the least changing peaks the best way to obtain the two TFs' common peaks?

Thanks in advance

Alejandro

DiffBind ChIP-seq • 557 views

ADD COMMENT • link 9 weeks ago by alejandrocastillaibeas ▴ 20

score 2 · Answer 1 · 2024-09-11

2

Entering edit mode

10 weeks ago

jared.andrews07 ★ 18k

Is doing a differential analysis and retrieving the least changing peaks the best way to obtain the two TFs' common peaks?

See bedtools intersect. There's no need to use DiffBind at all if you're just interested in common peaks.

DiffBind sets the consensus peakset to peaks that overlap in at least 2 samples by default, see the minOverlap parameter of the dba function. Since it sounds like you only have two samples, the consensus peakset DiffBind is using are the common (merged) peaks.

ADD COMMENT • link 10 weeks ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thank you for your response.

See bedtools intersect. There's no need to use DiffBind at all if you're just interested in common peaks.

To be honest, the bedtools intersect does not seem to reflect the real binding of my TFs--when I plot heatmaps with the bedtools intersection of my called peaks and the rest of each TF's specific peaksets, I see that on the specific peakset of one TF there is considerable binding of the other TF . That's why I considered that DiffBind could be advantageous. The output I obtain from DiffBind seems more real than the intersection, but if I don't understand the output I cannot go ahead.

ADD REPLY • link 10 weeks ago by alejandrocastillaibeas ▴ 20

1

Entering edit mode

That indicates a problem with peak calling rather than anything with either software. Peak calling is often a process that needs fine tuning or post-hoc filtering. Replicates for each group also makes identifying robust peaksets much easier.

ADD REPLY • link 10 weeks ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thanks for your response. I will check peak calling. Still puzzled by the output of diffbind though, I cant make sense of it

ADD REPLY • link 9 weeks ago by alejandrocastillaibeas ▴ 20

score 1 · Answer 2 · 2024-09-11

1

Entering edit mode

10 weeks ago

Ian 6.1k

I have been using DiffBind for years and recently discovered that the default method of forming a union of peaks is not a minimum overlap of 1bp, but a 1bp gap between two regions. I detailed this on BioC, am am waiting for confirmation of my finding:

https://support.bioconductor.org/p/9159586/

The advantage of using the DiffBind union is that you can specify whether there should be a minimum number of overlaps, if you have more than 2 replicates. You might also want to set summit=FALSE to gain the fottprint of overlapping regions, rather than a new summit.

ADD COMMENT • link 10 weeks ago by Ian 6.1k

0

Entering edit mode

Thank you for your response.

Thats interesting. You mean that the peaks need to be at most separated by 1 bp to be called a union?

The advantage of using the DiffBind union is that you can specify whether there should be a minimum number of overlaps, if you have more than 2 replicates. You might also want to set summit=FALSE to gain the fottprint of overlapping regions, rather than a new summit.

Thank you for the suggestion about the summit. Indeed, this choice may impact broadly in downstream analysis

Do you know why the output of the differential binding analysis does not include all consensus peaks? in my analysis there are a couple hundred missing

ADD REPLY • link 10 weeks ago by alejandrocastillaibeas ▴ 20

1

Entering edit mode

My understanding was that the minimum requirement for two regions to be joined was a 1bp overlap. My observation is that there can be a 1bp gap between two adjacent regions. I would like this clarified by the author. I don't know why some of your regions are missing. You could also check whether the internal blacklist removal of genes has caused it.

ADD REPLY • link 10 weeks ago by Ian 6.1k