Hi everyone,
I realized that many CpGs had a 3-distribution DNA methylation profile on EPIC v2 chips, suggesting that they are SNPs.
I've already used dropLociWithSnps
with the most recent annotation, so there must be additional unidentified SNPs. To your knowledge, are there any tools that would enable me to eliminate these probes?
I thought of developing an algorithm that would identify this distribution in 3 groups at 0%, 50% and 100%, but there might be a more precise tool based on the identification of these SNPs according to their genomic positions.
Thank you for your suggestions.
It seems to work pretty well thank you. Now I have some additional interrogations, because some CpGs that I identify with the trimodal distribution 0-50-100 identified by gaphunter or MethylToSNP are not located near known SNPs according to the 1000Genome. Since you seem to have experience in array analyses, I was wondering if you had any further biological explanation that could explain why we observe such a trimodal pattern of DNA methylation. No worries if you do not have any hypothesis !
I just read that there are meQTL, this is possible explanation
I'm glad it worked for you! It's possible that some of them are meQTCL, although I think nearby or associated SNPs sometimes do not necessarily impact methylation in such a strong manner to generate such a "clean" 0-50-100 distribution. I guess others must be directly cohort-specific SNPs. Because arrays often have "more than enough" probes to detect biological changes in any study (if they exist), I generally filter them without giving them much thought, really.
Thank you very much, I'm definitely going to test this !