Hi I guess this is a sort of naive question. In some papers, there are some Venn diagrams showing the overlap of TFs/Histone modifications bound peaks. I was wondering how they get the overlap. I checked the methods, but could not get the information, maybe this is too rudimentary that people even do not bother to write it down in the method part.
Now I have a ChIP seq data for a DNA binding protein, and H3K27ac, and want to see to what extent they overlap with each other. To that end, I called the peaks by "macs2", with the option "--broadPeak", and tried to get the overlapped regions by:
bedtools intersect -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks
Then I could get a long list of something like:
chr7 127471196 127472363 Pos1 0 + 127471196 127472363 255,0,0
...
my question(s) is(are): Is this this the right way to get commonly bound regions by two factors? instead of directly printing the result on the screen, could I wrap them up to an output bed.file? (I searched the bedtools manual, but to no avail.) Thanks in advance.
@EagleEye. Thanks, it works! PS: I had another small issue. after I called:
I trie to confirm the result by:
the total intervals did not add up, why would this happen? how could I solve this discrepancy? Sorry to bother again...
Single peak in file 1 can match with multiple peaks from file 2. Example, if peak1F1 from file 1 matches with peak6F2 and peak10F2 in file 2, there will be two entries in the results for peak1F1 from file 1.
@EagleEye , Thanks for the explanation. I sort of fell into some logical trap. Normally, people present data like this: PeakA(a number) only --group a PeakB(a number) only --group b PeakA & PeakB (a number) --groub c like:
I think it would be very easy to decide group a and b, PeakA and PeakB mutually does not intersect with any interval in other group. But how about group c? Image an extreme example, PeakA has only 2 peaks, and PeakA1 intersect with PeakB(1~n), then how much is group c? 1 or n ?