I was wondering what determines probe number of your sample?
I'm processing methylationEPIC data using minfi preprocessRaw, and it gives me 865859 features/probes.
But in the quality notice from Illumina, it states 866895 probes. why there's difference?
Thank you in advance
There has been a known aspect of the EPIC / 850K array where there have been slightly differing number of probes detected, supposedly due to pre-release and final-release versions of the chip. I've experienced the same difference but I've not narrowed down a cause, and the probes differed in no logical sense (i.e. targeted distinctly different loci). See here, in the read.metharray function, force parameter, in the details it states:
We have seen IDAT files from the same array, but with different number
of probes in the wild. Specifically this is the case for early access
EPIC arrays which have fewer probes than final release EPIC arrays. It
is possible to combine IDAT files from the same inferred array, but
with different number of probes, into the same RGChannelSet by setting
force=TRUE
edit: I should also mention that I've spoke with people that run these arrays, and they suspect there was an issue with the way the DNA dispersed across the chip, which is purely speculative. I'm inclined to disagree with them due to the self assembling nature of Illumina's BeadArray chips. It still appears to be a bit of a mystery.
Yes, I was using that function.
RGset<-read.metharray.exp(base = NULL, targets = samplesheet_epic, recursive = T,force = TRUE)
I got confused with those versions as well, but your answer is very helpful and at least I know I was using the correct parameter.
Yes, I was using that function. RGset<-read.metharray.exp(base = NULL, targets = samplesheet_epic, recursive = T,force = TRUE) I got confused with those versions as well, but your answer is very helpful and at least I know I was using the correct parameter.