Question

Illumina EPIC v2 IlmnIDs and probe names

0

Entering edit mode

11 months ago

christine.a.pedersen ▴ 10

I am working with illumina EPIC beadchip v1 and v2 data. I see that the v2 data comes with another IlmnID than the v1 data, and includes the cg name e.g. cg and a number, however in v2 it also has an appendix after the cg number looking like this _BC13. And I also see that the same probe id can have several entries like this

cg22051776_TC12  cg22051776_TC13  cg22051776_TC14

That correspond to the same place in the genome as far as I have understood, but the probes are a bit different. How to deal with this? Take the mean of all? Use only one? Analyze them as separate probes (don't think this is the best idea, if they indeed interrogate the same CpG site)?

After filtering, I have 4767 of these, and cannot go through them all to see which ones to keep and which ones to discard.

Illumina EPIC EPICv2 • 1.3k views

ADD COMMENT • link updated 4 months ago by Papyrus ★ 3.0k • written 11 months ago by christine.a.pedersen ▴ 10

0

Entering edit mode

Hi Christine! I am dealing with the same issue. In my case I identified them because I needed the IDs of the probes to merge the EPIC v2 data with EPIC v1 data, so after removing the _BC13 part, they appear duplicated (some of them more than twice). At first I thought about removing one of the positions randomly, but I realized that they don't have exactly the same beta/M value so I discarded this option. Were you able to solve this somehow? Thank you in advance!

ADD REPLY • link 10 months ago by desicasares ▴ 40

0

Entering edit mode

Hi, by any chance did you manage to solve this problem?

ADD REPLY • link 4 months ago by rna-seq_researcher ▴ 60

score 0 · Answer 1 · 2024-05-28

0

Entering edit mode

4 months ago

GenoMax 146k

The answer is provided in this Bioconductor thread: https://support.bioconductor.org/p/9156675/

ADD COMMENT • link 4 months ago by GenoMax 146k

score 0 · Answer 2 · 2024-05-28

The ID code and the new features for the EPICv2 are explained in the Illumina documentation which you can access here, in the Infinium MethylationEPIC v2.0 Manifest File Release Notes.pdf file. Basically:

probeID_[T/B][C/O][1/2][1-10]

[top / bottom strand]
[originally bisulfite converted strand / opposite strand]
[type I / type II probe]
[number of replicates]

For selecting a replicate, you could select the one with higher quality (e.g. in terms of detection p-value, signal intensity), you could average the replicates (for instance, this is the default behavior that the sesame package does if you specify openSesame(collapseToPfx = TRUE)), you could even select one at random as they have been described to have high correlation (see this paper). This is further discussed in this other paper where they go as far as to classify the replicate probes by their performance. (Also depending on later analysis you could even keep them separately and then model them in limma or things like that).