QC Illumina EPIC array: Removing SNPs at CpG sites
2
0
Entering edit mode
15 months ago
kaaz • 0

Hi,

I am brand new to working with methylation data, and I have been designated with preprocessing the data from the Illumina EPIC array. I am using ENmix to perform the background, dye bias, and probe-type bias corrections. In my research on the subject, there are other steps needed to be performed to ensure quality results such as handling SNPs and removal of sex CpG sites and probes that cross-hybridize.

I am a bit confused on how to deal with SNPs. I understand that SNPs that overlap these CpG sites can cause untrue signals, hence why it's important to remove them, but various papers have used different means to handle them. Some papers I read have removed all SNP1 associated probes based on information provided by Illumina and overlapping between dbSNP. Other papers have removed common SNPs2 (MAF > 0.05)3 and finally, others have removed rare SNPs (MAF < 0.05)4 (MAF for different ancestries based upon this paper). I am trying to understand which method is best when handling these SNPs.

I am also considering using MethylToSNP as I am working with a non-European population.

Is it specific to context? What would be the best approach?

Any advice would be very helpful! Thank you in advance.

SNPs methylation QC Illumina EPIC • 1.4k views
ADD COMMENT
1
Entering edit mode
15 months ago
Papyrus ★ 3.0k

IMO the way to do this is completely arbitrary in the literature: most studies do tend to use the internal SNP annotation which comes with the IlluminaHumanMethylationEPICanno.ilm10b4.hg19 package, and also the removal of cross-reactive or problematic probes (there are many papers on this, e.g. one, two, three, or the one you mention). That being the "typical" pipeline, it may be safest not to deviate too much from it. Nonetheless, the handling of the SNP removal is also very arbitrary, with studies removing probes which have SNPs MAF > 0.05 or >0.01 anywhere on the probe, only on the CpG site, only on the SBE site... You'll even see papers removing almost 50 % of probes in the array prior to analyzing.

I agree that, added to this, using a SNP-discovery tool is useful because it will be data-specific to your cohort. I've used MethylToSNP and also minfi::gaphunter, with the latter typically finding more suspicious sites, though this may be data-dependent and you can play around with the parameters.

If you have enough sample size, you can also set up validation and discovery cohorts, or use external data, to validate your EWAS findings.

Another thing is that (and DNA methylation tends to be spatially correlated) you always have the option to do a differential region analysis which will of course be more protected from the influence of SNPs.

ADD COMMENT
0
Entering edit mode
15 months ago
Zhenyu Zhang ★ 1.2k

We use SeSAMe (https://academic.oup.com/nar/article/46/20/e123/5061974) for Epic analysis. This is a new tool from the same TCGA methylation analysis group.

ADD COMMENT

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6