Hi,
I want to get the average methylation % for a targeted region, for example, see below...
Chrom start end Meth_% count_methylated count _unmethylated
chr1 1371409 1371409 0 0 5
chr1 1371539 1371539 0 0 88
chr1 1371567 1371567 2.32 2 84
chr1 1371572 1371572 2.30 2 85
chr1 1371575 1371575 1.12 1 88
The targeted region is chr1:1371409-1371575 and each row corresponds to a methylated C (1-based BEDgraph coverage file from bismark)
How should I proceed with my calculation?
1) Should I just take the mean of Meth % per locus? (Sum of Meth % values/5) i.e. (2.32+2.30+1.12)/5 or 1.148 %
OR
2) total count_methylated/total count_methylated+total count_unmethylated i.e. (2+2+1)/(2+2+1+5+88+84+85+88) = 0.014 or 1.4%
This one is more like a weighted mean (correct me if I am wrong).
OR
There is some other correct method to do this kind of calculation?
Cross-posting in main GitHub for bismark page, hopefully, will get a reply there.
Is your goal simply to represent this particular region? Or did you obtain it as a DMR (differentially methylated region) from some kind of analysis?
In the second case, I would represent (if you can) the same thing that the algorithm used. For example, the frequently used BSmooth pipeline does a weighted smoothing which takes into account coverage, so if I had used that algorithm, I would represent the smoothed values.
Another option is to set a coverage threshold (say >10 counts) and consider that all of the CpGs above the threshold have valid information. With that reasoning, I would not weigh the values. There are algorithms which work like this and do not take into account coverage when computing DMRs...
Hi,
Thanks for your reply. I just wanted to get methylation % as of now..but I have got a reply now..you can go to this link to find the answer, was meaning to post it anyway here: https://github.com/FelixKrueger/Bismark/issues/354
thanks for the link, it's a great answer by Felix!