What does "pileup" mean in the context of MACS2 peak caller?
1
2
Entering edit mode
7.7 years ago
ariel.balter ▴ 260

I usually think of a "pileup" file as a file made from a bam/sam alignment that lists the coverage at each coordinate. This is what samtools pileup and samtools mpileup make.

For instance, here is the output of samtools pileup:

#chr    coord   base    count
1        9998   n       1
1        9999   n       1
1       10000   n       4
1       10001   t       5
1       10002   a       7
1       10003   a       7

macs2 callpeaks produces what it calls a "pileup," for example *_treat_pileup.bdg that looks like:

1       15098   15104   4.74683
1       15104   15142   5.69620
1       15142   15178   4.74683
1       15178   15188   3.79747
1       15188   15192   4.74683
1       15192   15224   3.79747
1       15224   15245   2.84810
1       15245   15251   1.89873
1       15251   15277   0.94937
1       15277   15303   1.89873
1       15303   15314   2.84810
1       15314   15329   3.79747
1       15329   15335   4.74683
1       15335   15392   3.79747
1       15392   15424   4.74683
1       15424   15450   3.79747
1       15450   15461   2.84810
1       15461   15476   1.89873

This has coordinate ranges of varying widths and fractional numbers.

However, the README, basically the only documentation does not define the output format.

ChIP-Seq pileup macs2 • 9.5k views
ADD COMMENT
1
Entering edit mode
7.7 years ago

It's conceptually the same thing. The peaks can be of different width, and it's desired mostly to know how the whole peaks are covered on an average, rather than the individual bases.

ADD COMMENT
0
Entering edit mode

But those are not counts under peaks (which I call coverage, such as is calculated with bedtools coverage). Those gaps are just small gaps (0-100 bp) in coordinates. And, the 4th column isn't coverage--it's a decimal number that measures _something_, but I don't know what and it's not documented.

ADD REPLY
2
Entering edit mode

see this post: https://groups.google.com/forum/#!searchin/macs-announcement/callpeak$20pileup%7Csort:relevance/macs-announcement/F4ZQMqhD-N4/gw2-V6l0CQAJ

The treatment and control bedGraph pileups generated by the callpeak function are automatically scaled to the same depth. By default, the sample with the most reads are scaled down linearly to the same depth as the sample with the fewest reads. This can be reversed with the --to-large option. There is also the --down-sample option if you prefer.

so the callpeak should be equivalent to "pileup", barring the normalization factor. And probably it is also normalized per million of mapped reads. I am wondering if you used downsampling --down-sample ?

ADD REPLY

Login before adding your answer.

Traffic: 2178 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6