Question

What does "gappedPeak" mean

4

Entering edit mode

7.7 years ago

Wet&DryImmunology ▴ 240

Hi, I used "macs2" to call peaks from my data of ChIP-seq. This is not my first time to use macs2, but still found myself not being able to grasp what "gappedPeak" stands for in the OUTPUT of macs2.

"NAME_peaks.narrowPeak" , "NAME_peaks.broadPeak" are quite intuitive, "narrowPeak" means narrow peaks which is suitable for TFs "broadPeak" means broad peaks which is suitable for histone modifications spanning wider ranges of genomic regions.

But how about "gappedPeak"?

In GitHub of macs2:

"NAME_peaks.gappedPeak is in BED12+3 format which contains both the broad region and narrow peaks." it seems gappedPeaks contains both categories (narrow & broad), if that is the case, where the gaps come from?

and https://genome.ucsc.edu/FAQ/FAQformat#format14 for ENCODE gapped peaks ( I assumed that those peaks are called using macs) it explained: "regions of signal enrichment based on pooled, normalized (interpreted) data where the regions may be spliced or incorporate gaps in the genomic sequence" "regions may be spliced or incorporate gaps" I could understand RNA being spliced, but for DNA?

Anyone could explain?

[Jun@host workingdirectory]$ less Histonemark_cellA_peaks.broadPeak 
Chrom ChromStart ChromEnd name                        score strand    signalValue pValue qValue
chr1    4775387 4776044 Histonemark_cellA_peak_1     41      .       3.22266 5.57941 4.13770
chr1    4847525 4848363 Histonemark_cellA_peak_2     38      .       3.03717 5.39983 3.82081
chr1    5073148 5073709 Histonemark_cellA_peak_3     31      .       3.02635 4.72286 3.10498

[Jun@host workingdirectory]$ less Histonemark_cellA_peaks.gappedPeak 
Chrom   ChromStart ChromEnd name                       score   strand thickStart thickEnd itemRgb blockCount blockSizes blockStarts signalValue pValue qValue
chr1    4775387 4776044 Histonemark_cellA_peak_1     41      .       4775387 4776044 0       2       645,1   0,656   3.22266 5.57941 4.13770

Of course, to understand a file, it is always better to look insides of it. By looking the inside of the broadPeak file and gappedPeak file, I realized that the key is to understand what is "thickStart"/"thickEnd". Then I found a post trying to address that but I found myself still being unable to understand. Especially "Thickstart and thickend are the left and the right boundaries of the coding sequence. " explained by Ido Tamir made me more confused. What does "boundaries of the coding sequence" means in the context of ChIP-seq?

ChIP-Seq macs2 Enocde • 6.7k views

ADD COMMENT • link updated 7.7 years ago by GouthamAtla 12k • written 7.7 years ago by Wet&DryImmunology ▴ 240

score 6 · Accepted Answer · 2017-03-17

6

Entering edit mode

7.7 years ago

GouthamAtla 12k

GappedPeak is a representation of narrow peaks as blocks over a broad peak. To trick the visualisation tools, they use the same format as gene models, but use the narrow peak coordinates as exons coordinates and the broad peak coordinates as coding region coordinate.