In this article : The DNA methylation landscape of human early embryos,the author mentioned 100-bp-tile-based DNA methylation calling algorithm (they used RRBS to detect 5mC/5hmc).
The algorithm is described like this: first,genome is binned into consecutive 100-bp tiles.The number of reported C, divided by the total number of reported C and T captured in the 100-bp tiles,is interpreted as the 100-bp-tile averaged DNA methylation level.The DNA methylation level of each sample is the average of the 100-bp tiles.
Why can't we just average every methylated C level ? What's the advantage of sliding window ?
Thank you :)
I found the BSmooth (http://www.ncbi.nlm.nih.gov/pubmed/23034175) paper provides a justification for the use of smoothing:
They then concluded the following:
So my guess is that one answer could be that smoothing/windows allowed lower coverage sequencing through still having low standard errors associated with the (average/smoothed) DNA methylation level. This is of course at the cost of resolution in resolving individual CpGs.
My guess is that they once had a dataset with either low coverage or a lot of noise. The sliding window would allow you to handle that and still assign values to focal regions/points. There's no other good reason that I know of to do this and it's not something I would personally do by default.