Question

minfi pre-processing and normalization

0

Entering edit mode

7.5 years ago

niutster ▴ 110

I have used minfi for pre-processing and normalization, there are some questions about minfi's probe filtering( p-value detection, bead count, SNP). As i know, minfi removes SNP probes by default, there is a way to remove probe with high p-value, too but how can i remove probe with certain bead counts? there is not any function in minfi tutorial for bead count removing. second problem is normalization. I need both between array and within array normalization which is supported with preprocessSWAN() and preprocessFunnorm() but input and output of these normalization functions are not consistence. If i want to use Swan normalization before or after Funnorm, input of one of them and another ones output wont be the same. How can i perform both normalizations ?

minfi R pre-processing bead-count normalization • 5.8k views

ADD COMMENT • link updated 7.5 years ago by andrew.j.skelton73 6.6k • written 7.5 years ago by niutster ▴ 110

0

Entering edit mode

For bead count information you need to load in your idats as an extended rgset and then either use getNBeads() to get a matrix of bead counts, or wateRmelon::beadcount(). I would recommend the latter because getNBeads() returns bead count on a per-probe basis instead of a per-cpg site basis (use dim() on matrices returned from both functions and see what I mean).

Once you get beadcount info you can filter based on your own thresholds using subsetByLoci().

Most people only use one normalization, chosen dependent on dataset characteristics and personal preference (at least that's what it seems like to me). I wouldn't try to use two if you are not sure what you are doing.

Good luck

ADD REPLY • link 7.3 years ago by victor.2wy • 0

score 2 · Answer 1 · 2017-06-08

There's definitely some issues with what you're asking, I'll try and hit them off one by one.

Minfi will not remove probes with SNPs by default in the CpG, probe sequence or SBE. You'll need to use the dropLociWithSnps() function, with an additional maf argument, which specifies your minor allele frequency cutoff.

For detection P value filtering, the older versions of the minfi guide do include some clues as to how to do it. The idea is to identify these probes prior to normalisation, and remove them post-normalisation. Here's an example where raw_idat is the raw data read using read.metharray.exp, which removes probes where their detection p value is >0.01 in 50% of samples:

lumi_dpval        <- detectionP(raw_idat, type = "m+u")
lumi_failed       <- lumi_dpval > 0.01
lumi_dpval_remove <- names(which(rowMeans(lumi_failed)>0.5, TRUE))
rm(lumi_dpval, lumi_failed); gc(); set.seed(73)
norm_data         <- preprocessFunnorm(raw_idat, bgCorr  = T, dyeCorr = T,verbose = T)
remove            <- match(lumi_dpval_remove,rownames(norm_data))) %>% unique %>% na.omit
norm_data_f       <- norm_data[-remove,]

In terms of normalisation, you do a single method, do not combine them unless in very specific circumstances. I believe that preprocessFunnorm() does SWAN, but with extra steps to regress out technical variation based on control probes. Also, it should be noted that while I believe SWAN normalisation is deterministic, the preprocessFunnorm() method is not, so set the seed first as per my example above.

If you're still convinced that you should be using both preprocessFunnorm and preprocessSWAN, then please expand on why.