Statistical test for DMR annotation?
2
0
Entering edit mode
9.9 years ago
Amit Lavon ▴ 10

Hello friends,

I see that it was discussed here: A: Dmr (Differentially Methylated Regions) Identification Software but I would like to dig a little deeper into that, because I couldn't find a satisfying answer yet.

So - what statistical test would you choose for DMR (differentially methylated regions) annotation? Meaning you have a 2X2 table with column labels `WT` and `mutant`, and row labels `methylated` and `not methylated`, each cell has a count for a single region. You need to test whether methylation is dependent on the mutation.

I see that `methylkit` uses Fisher's Exact Test, but that test doesn't make sense to me. Why would DMR's behave hyper-geometrically? This assumes that the background set from which you sample is finite, right? And that's not the case with methylation - you can (theoretically) sample as much as you want, like coin flipping.

Am I right? What test would you use?

Thanks a lot, Amit

statistics methylation DMR • 5.1k views
ADD COMMENT
1
Entering edit mode

If the no-replacement aspect of Fisher's test is what you don't like then just do a binomial test instead. Having said that, the two approach each other with increasing N. Having said that, Charles' answer makes much more sense than a Fisher's or binomial test.

ADD REPLY
0
Entering edit mode

Thank you Devon.

What do you think is the appropriate test for DMRs with a fixed-size window?

Amit

ADD REPLY
4
Entering edit mode
9.9 years ago

I think that there are two types of DMR calculations: those with predefined region boundaries and those without predefined region boundaries.

If you have a predefined window (such as pre-defined regions of interest on the 450k array, targeted BS-Seq, or any sliding-window based analysis), I think the main trick is the summarization (at least that is my opinion). For example, COHCAP will either average the signal across CpG sites or CpG islands, and then use a simple statistical test like an ANOVA on the continuous signal (in addition to using additional filters to try and reflect the fact that the original signal can likely be thought of as a discrete variable where each CpG site is either homozygous methylated, homozygous unmethylated, or heterozygyous). methylKit and IMA also fall in this category. So, the short answer is that you may be able to use one of those other tools (or a similar strategy), but I think there people out there that are statisfied with the methylKit results.

DMR tools without predefined boundaries (such as bumphunter in the minfi package or ChAMP) are a totally different beast. A Fisher's Exact Test is unquestionably inappropriate in this situation.

If it helps, there are some script templates and limited benchmarks for a few such programs:

http://www.nature.com/protocolexchange/protocols/2965#/introduction

http://sourceforge.net/projects/cohcap/files/Protocol_Exchange_Example.zip/download

However, the original question was specifically for WGBS data (whereas the links above are for 450k data). Here, methylKit and bsseq are the main options that I know about. MethylSig is another option that I have heard about but not yet tried:

http://sartorlab.ccmb.med.umich.edu/node/17

ADD COMMENT
0
Entering edit mode

Thank you for the detailed answer.

My question is more on the validity of specific tests on the case of DMRs.

What do you think is the appropriate test for DMRs with a fixed-size window? Can you give reasons?

Amit

ADD REPLY
0
Entering edit mode

Hi Amit,

I apologize that this quite a while after your original post.

These would be my thoughts:

1) The situation might be different for Whole Genome Bisulfite Sequencing (WGBS), but I think targeted sequence has some rough targeted boundaries to begin with. So, I think one problem could be that many tiling windows might have little or no CpG sites within them (and I would recommend looking for regions with at least 4 differentially methylated sites).

2) Specifically for COHCAP, the abilty to define de novo boundaries have been added in the Bioconductor version:

https://www.bioconductor.org/packages/release/bioc/html/COHCAP.html

Example of parameters to the main functions include:

  • max.cluster.dist in COHCAP.avg.by.island() [typically, my preferred strategy, with some possible parameter changes]
  • max.cluster.dist in COHCAP.avg.by.site()

My default max.cluster.dist is set to NULL, so that the exact annotated region boundaries are used.

There is also a COHCAP.denovo() function, although I have to admit that I wouldn't typically use that (to cluster differentially methylated sites in the absence of any gene annotations).

3) There are other methods that can define boundaries for differentially methyateld regions. As referenced in the Protocol Exchange link, minfi/bumphunter is one such option for Illumina methylation array data (such as the 450k or EPIC arrays).

ADD REPLY
0
Entering edit mode
7.3 years ago
jordi • 0

Look at the math in informME:

Jenkinson, G., Pujadas, E., Goutsias, J., & Feinberg, A. P. (2017). Potential energy landscapes identify the information-theoretic nature of the epigenome. Nat Genet, 49(5), 719–729. Retrieved from http://dx.doi.org/10.1038/ng.3811

All the other tools do not account for correlation, or the closer they get is using some sort of smoothing technique. By assuming independence, they are not capable to control the false positive rate. In addition, differences in methylation do not necessarily have to be related to differences in mean. It could be the case that the probability distributions for a given region of the methylation state (binary vector of certain length) have the same mean but completely different shapes (a bimodal and a unimodal distributions can have same mean).

ADD COMMENT

Login before adding your answer.

Traffic: 1920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6