Homer's annotatePeaks.pl centers tags for counting, based on their estimated ChIP-fragment lengths. To avoid that, the -len/fragLength
parameter can be set to 1 (as is useful in the case of 5' RNA).
This can be seen by running annotatePeaks.pl (can be downloaded from here) without any parameters, to see its help.
However, I have found that changing it by setting the -len
parameter from auto to 1 influences the results by a magnitude (in my case).
With -len auto:
annotatePeaks.pl Tiles.pos GRCm38 -norm 1e8 -size 2000 -len auto -hist 50 -ghist -d GSE124804_tagDir/ | head -2 | tail -1
1604 1.43958868894602 1.43958868894602 1.72750642673522 1.87146529562982 1.72750642673522 1.87146529562982 2.59125964010283 2.87917737789203 2.87917737789203 2.59125964010283 2.73521850899743 2.73521850899743 2.73521850899743 3.02313624678663 2.73521850899743 2.87917737789203 3.74293059125964 4.17480719794345 4.46272493573265 4.31876606683805 4.03084832904884 5.18251928020566 5.03856041131105 5.47043701799486 4.75064267352185 3.88688946015424 3.31105398457584 3.02313624678663 3.31105398457584 3.59897172236504 2.44730077120823 1.72750642673522 2.01542416452442 1.87146529562982 2.44730077120823 2.30334190231363 2.15938303341902 2.15938303341902 2.15938303341902 2.15938303341902 2.15938303341902
with -len 1 :
annotatePeaks.pl Tiles.pos GRCm38 -norm 1e8 -size 2000 -len 1 -hist 50 -ghist -d GSE124804_tagDir/ | head -2 | tail -1
1604 56 0 0 0 112 56 280 168 112 56 224 56 56 168 224 112 168 -5.6843418860808e-14 224 112 -7.105427357601e-14 560 224 448 504 280 55.9999999999997 -2.91322521661641e-13 -2.91322521661641e-13 55.9999999999997 55.9999999999997 55.9999999999997 112 -2.98427949019242e-13 224 168 112 280 55.9999999999996 55.9999999999996 112
What can explain it?
The tagDir is base on a bam file of paired-end reads, the 9th field of bam file (fragment length) ranges from about -450 to +450
I would presume that is due to your normalization?
The mapped read is usually extended to the presumed ChIP-fragment length of several hundred base pairs. By limiting it to 1bp, you will lose >99,5% of your total counts. If you still normalize to 1e8 total counts, the remaining counts will be multiplied with the respective factors to make up for this loss.