I ran into this issue today while I was plotting TSS occupancy heatmaps. If you take the coordinates of a single transcription start site, extend them by 1000bp and cut this region into 500bp windows then you end up with 5 windows, not 4 as I would have assumed.
Take the TSS of a gene:
# test.bed
chr1 4857693 4857694 Tcea1 1 +
Increase the size by 1000bp upstream and downstream:
bedtools slop -i test.bed -g mm10.chromsizes -b 1000 > test.plusminus1000bp.bed
Check output of bedtools slop
:
# test.plusminus1000bp.bed
chr1 4856693 4858694 Tcea1 1 +
Split feature into 500bp windows:
bedtools makewindows -b test.plusminus1000bp.bed -w 500 > test.plusminus1000bp.window500bp.bed
Check output of bedtools makewindows:
# test.plusminus1000bp.window500bp.bed
chr1 4856693 4857193
chr1 4857193 4857693
chr1 4857693 4858193
chr1 4858193 4858693
chr1 4858693 4858694
The last feature in the file is a single base-pair window. I assume this happens because of the 0-based coordinate system, but I'm not sure it's obvious that such a window is produced. I wonder if such output could change the results of an analysis if one of the assumptions is that all windows are the same length? Would it be better to remove this single base-pair window?