Hi!
Does anyone know how common differentially methylated regions (DMRs) (or just methylated regions) are? Say, in a 10,000bp region how many DMRs can one expect?
Also, does anyone know about the distribution of the size/length of DMRs, i.e. how many CpGs are usually in a DMR? Or the usual range? E.g. is a region consisting of just 2 CpGs can be considered a methylated region?
Thanks and good day!
Thanks. Perhaps it's more apt to use the term CpG clusters than methylation regions. In this case, the conditions are not relevant. Even a vague range would help.
I'd like to compare methods that detect methylated regions. Instead of using simulated data, I'd like to use actual bisulfite sequencing data. Knowing the method/s that return the most realistic number of methylated regions and the most realistic sizes would help in the comparison.
CpG islands in mammalian genomes are typically hypomethylated. I.e. detecting clusters of CpGs will not necessarily translate into detecting "methylated regions". What type of method do you have in mind anyway? Methylation itself is usually detected via bisulfite sequencing, i.e. the chemical treatment of the DNA. Are you trying to see how accurate the distinction between unmethylated CpGs (= 0% reads) vs. fully methylated CpGs (= 100% reads) are?
Thanks. I'm looking at an approach to detecting methylated regions that measures the correlation of methylation levels of neighboring CpG sites (CpG sites meeting a certain threshold of correlation are combined into a methylated region). However, this approach returns relatively short regions. This is why I'd like to know what the typical range of sizes of methylated regions are. I'd also like to know the range of how common methylated regions are to compare with the output of this method.
Maybe run another tool that does "methylation region detection" on your data and see what comes up?
Thanks, but how will I know how accurate this other tool is if I don't have an estimate of the "ground truth"?
You need to define "ground truth". What it is that you want to test? What is it that your tool is trying to solve?
Ground truth here would be the range of lengths of DMRs.