Hi,
I am brand new to working with methylation data, and I have been designated with preprocessing the data from the Illumina EPIC array. I am using ENmix to perform the background, dye bias, and probe-type bias corrections. In my research on the subject, there are other steps needed to be performed to ensure quality results such as handling SNPs and removal of sex CpG sites and probes that cross-hybridize.
I am a bit confused on how to deal with SNPs. I understand that SNPs that overlap these CpG sites can cause untrue signals, hence why it's important to remove them, but various papers have used different means to handle them. Some papers I read have removed all SNP1 associated probes based on information provided by Illumina and overlapping between dbSNP. Other papers have removed common SNPs2 (MAF > 0.05)3 and finally, others have removed rare SNPs (MAF < 0.05)4 (MAF for different ancestries based upon this paper). I am trying to understand which method is best when handling these SNPs.
I am also considering using MethylToSNP as I am working with a non-European population.
Is it specific to context? What would be the best approach?
Any advice would be very helpful! Thank you in advance.