Approaches for normalisation of Illumina 450k data with large global differences in methylation
1
4
Entering edit mode
9.4 years ago
phil.chapman ▴ 100

Hi,

I was wondering if someone could comment on, or point me in the right direction of, considerations when normalising Illumina 450k Methylation data when there are large differences in global methylation status? The experiment I have is where the same cell line is used across 24 samples but there are different treatments and timepoints. One of the treatments is with decitabine, for example, which results in a marked global demethylation seen as a leftward shift in the global beta value profile - see Figure 5A below for an example.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3587326/figure/F5/

I have read around the area a bit, and much of the literature is concerned with exploring differences between cancer/normal or different tissues. This guide from Brent Pedersen was particularly helpful: https://github.com/brentp/450k-analysis-guide

The minfi vignette and associate papers were also useful, but the thing that struck me was the comment in the Dedeurwaerder et al 2014 review also quoted in the minfi Functional Normalisation paper:

There is to date no between-array normalization method suited to 450K data that can bring enough benefit to counterbalance the strong impairment of data quality they can cause on some data sets

So, am I best off just doing a bare minimum within-array normalisation using, say, the preprocessRaw function in minfi and not doing any between-array normalisation at all?

Any comments gratefully received.

Phil Chapman, CRUK Manchester Institute

minfi methylation 450k • 3.5k views
ADD COMMENT
4
Entering edit mode
9.4 years ago
fortin946 ▴ 190

Hi Phil,

it seems that your dataset is a perfect example of when to use functional normalization. If there are large differences in global methylation status, functional normalization should be able to keep them while removing unwanted technical variation. Functional normalization is a kosher within-array normalization based only the array control probes. Those, by design, are not associated with the biology of your samples, and therefore global differences in methylation seen between samples should be conserved. In our implementation of functional normalization in minfi, preprocessFunnorm() implements as well the 'noob' background correction method (Triche et al., 2013), which improves significantly the downstream analysis results.

Let me know if you have any more questions,

Jean-Philippe Fortin

ADD COMMENT
0
Entering edit mode

Thanks very much for the reply Jean-Philippe. From my reading I thought your method would be appropriate, so I compared the mds and density plots of the same dataset either completely un-normalised, or after running preprocessNoob() and preprocessFunnorm(). The groups seem to cluster tighter after noob but spread out again after funnorm, the shape of the density plot changes too. I wasn't quite sure how to interpret this so it would be great to hear your thoughts.

Please see a report on RPubs here with some more detail - http://www.rpubs.com/chapmandu2/91237

Thanks again, Phil

ADD REPLY
1
Entering edit mode

Hi Phil,

I just looked at your RPubs report (pretty nice!) -- it seems indeed that you've got tighter clusters with noob. In my experience, when the sample size is small (n=19 in your study, correct?), noob by itself performs the best. However, you might want to try preprocessFunnorm() with different number of with (nPCs =1, 2 ,... 5). Otherwise, I would use preprocessFunnorm() with the following parameters:

nPCs = 0, bgCorr = TRUE, dyeCorr = TRUE

which calls preprocessNoob() and performs a quantile normalization on the Y chromosome by sex.

Hope this helps!

Jean-Philippe

ADD REPLY
1
Entering edit mode

Thanks again Jean-Philippe that's really useful insight. There are actually 24 samples, it's quite difficult to see with filled circles in the plot. I also found your BioC 2014 tutorial for minfi which gave some information on the QC features, it seems that two of my samples fell below the expected line but not by much. I'll try excluding these from the analysis.

A further question I do have is how you would go about looking for differentially methylated regions when you have such a significant global demethylation. What I'd be looking for in a sense is any regions that are differentially methylated more or less than the global shift. Do you think bumphunter could be used in this context in some way? I'm imagining you might add a constant or something to the model?

Thanks again.

ADD REPLY
1
Entering edit mode

Hi Phil,

this is a good reminder that we need to update the vignette of minfi (it is more than outdated). The QC line was defined using blood samples with no global hypo/hyper methylation, and therefore is not relevant for your study -- for instance most of tumor samples fall below this line in my experience.

For the DMR analysis, I don't think there is a general answer to your question. It is hard to define what is the global shift between your samples, since the global shift could be a combination of several small regions with large shifts or/and large regions of hypomethylation ("hypomethylation blocks") etc. You might want first to see if there are large blocks of hypomethylation between your different treatments. In the devel version of minfi, there is a piece of code to do that: https://github.com/kasperdanielhansen/minfi/blob/master/R/blocks.R

If you find blocks, then I would run bumphunter and see if you get DMRs outside of those blocks (you probably will).

Jp

ADD REPLY
1
Entering edit mode

Great thanks, I'll take a look at that code. I just don't want to do a standard DMR analysis because I have a sense that everything will change!! Re the minfi vignette it would be really useful to have some additional insight on normalisation approaches. I didn't use minfi initially for my analysis (used lumi instead) simply because it wasn't too clear to me what a sensible approach was, whereas lumi seemed a bit more explicit. The information was there of course, I just had to read the papers, which I subsequently did, but still it would be really useful to give some sort of sensible overview and advice for what sort of normalisation to use when (perhaps using some of the comments here).

Happy to contribute/comment further on the documentation side if it helps - I can't write good enough R to develop packages but I can do documentation... :)

ADD REPLY

Login before adding your answer.

Traffic: 1651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6