Question

Identification of commonly bound regions by TFs/histone modification/other features

0

Entering edit mode

8.5 years ago

Wet&DryImmunology ▴ 240

Hi I guess this is a sort of naive question. In some papers, there are some Venn diagrams showing the overlap of TFs/Histone modifications bound peaks. I was wondering how they get the overlap. I checked the methods, but could not get the information, maybe this is too rudimentary that people even do not bother to write it down in the method part.

Now I have a ChIP seq data for a DNA binding protein, and H3K27ac, and want to see to what extent they overlap with each other. To that end, I called the peaks by "macs2", with the option "--broadPeak", and tried to get the overlapped regions by:

bedtools intersect -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks

Then I could get a long list of something like:

chr7    127471196  127472363  Pos1  0  +  127471196  127472363  255,0,0
...

my question(s) is(are): Is this this the right way to get commonly bound regions by two factors? instead of directly printing the result on the screen, could I wrap them up to an output bed.file? (I searched the bedtools manual, but to no avail.) Thanks in advance.

ChIP-Seq Bedtools sequence • 1.9k views

ADD COMMENT • link updated 8.5 years ago by EagleEye 7.6k • written 8.5 years ago by Wet&DryImmunology ▴ 240

score 1 · Answer 1 · 2017-02-13

1

Entering edit mode

8.5 years ago

EagleEye 7.6k

bedtools intersect -wao -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks > common_regions.txt

The file 'common_regions.txt' will have overlapped peaks information.

Note: To avoid confusion make your input '.broadPeaks' with minimal information with no extra columns [chr,start,end,peak_name]

ADD COMMENT • link 8.5 years ago by EagleEye 7.6k

0

Entering edit mode

@EagleEye. Thanks, it works! PS: I had another small issue. after I called:

bedtools intersect -wao -a proteinA_cellX_peaks.broadPeaks -b H3K27ac_cellX_peaks.broadPeaks > common_regions.txt

I trie to confirm the result by:

  wc -l proteinA_cellX_peaks.broadPeak 
    19559 proteinA_cellX_peaks.broadPeak 
    wc -l common_regions.txt 
    19604 common_regions.txt

the total intervals did not add up, why would this happen? how could I solve this discrepancy? Sorry to bother again...

ADD REPLY • link 8.5 years ago by Wet&DryImmunology ▴ 240

0

Entering edit mode

Single peak in file 1 can match with multiple peaks from file 2. Example, if peak1F1 from file 1 matches with peak6F2 and peak10F2 in file 2, there will be two entries in the results for peak1F1 from file 1.

ADD REPLY • link 8.5 years ago by EagleEye 7.6k

0

Entering edit mode

@EagleEye , Thanks for the explanation. I sort of fell into some logical trap. Normally, people present data like this: PeakA(a number) only --group a PeakB(a number) only --group b PeakA & PeakB (a number) --groub c like: enter image description here

I think it would be very easy to decide group a and b, PeakA and PeakB mutually does not intersect with any interval in other group. But how about group c? Image an extreme example, PeakA has only 2 peaks, and PeakA1 intersect with PeakB(1~n), then how much is group c? 1 or n ?

ADD REPLY • link 8.5 years ago by Wet&DryImmunology ▴ 240