Question

Your Thoughts On A "Standard" Pipeline To Process Illumina 450K Data

10

Entering edit mode

12.6 years ago

Neilfws 49k

I've recently started working with the Illumina 450K methylation platform. There are several software packages available to handle this data including methylumi, lumi, minfi (those 3 from Bioconductor) and IMA. I'm disregarding IMA since it requires text files exported from BeadStudio in a particular format (which I don't have) and I prefer to start from IDAT files.

The packages are similar in that they create an R object based on the eSet class, but they all come with different methods for adjusting colour bias and normalizing. I'm finding the number of choices rather confusing. For example:

methylumi has a rather basic method, normalizeMethyLumiSet(), which does not seem entirely appropriate for the 450K platform
lumi has methods for colour bias correction, background adjustment and normalization; it's not clear to me whether these methods should be applied separately to the type I and type II probes on the 450K platform (and if so, whether I'd then somehow recombine the data)
minfi makes no mention of colour bias but has a method in the development version, preprocessSWAN(), which does normalization accounting for differences in type I/type II probes

So my questions are:

Which package do you use? Or do you use more than one, in combination?
Should I even worry about colour bias adjustment? And if so, should I treat type I and II probes differently? And if so, how?
The "best" method, in your opinion, to normalize? Using lumi - ssn or quantile? Or use minfi? Treat colours separately or not? Treat type I/II probes separately or not?

My current feeling is that preprocessSWAN() in the minfi development version is the way to go, but I'd appreciate your thoughts (and especially, your R code).

illumina methylation microarray bioconductor • 12k views

ADD COMMENT • link updated 10.8 years ago by Charles Warden 8.3k • written 12.6 years ago by Neilfws 49k

score 4 · Answer 1 · 2012-03-29

4

Entering edit mode

12.6 years ago

Aaron Statham ★ 1.1k

Below is my code for using minfi - I get a pearson correlation of 0.95 between beta values from a 450k array and whole genome bisulfite sequencing (cell line) so at least in my situation I don't know how much more there is to gain.

RG.raw <- read.450k.exp(base = slide.folder, targets = files.table)
methyl.norm <- preprocessIllumina(RG.raw, bg.correct = TRUE, normalize = "controls")
beta.table <- getBeta(methyl.norm)

ADD COMMENT • link 12.6 years ago by Aaron Statham ★ 1.1k

1

Entering edit mode

Hah at the moment I'm not paid to worry - there are always improvements to be made but a 0.85/0.95 correlation is good enough for me until someone does some serious benchmarking.

ADD REPLY • link 12.6 years ago by Aaron Statham ★ 1.1k

0

Entering edit mode

wow! that' great correlation. Is that the norm for 450k?

ADD REPLY • link 12.6 years ago by brentp 24k

0

Entering edit mode

Worst I've gotten between 450k and bisulfite seq is 0.85 and that was comparing primary cells (grown for a short time in culture) isolated between two different patients ie patient 1 on 450k, patient 2 on bis-seq.

ADD REPLY • link 12.6 years ago by Aaron Statham ★ 1.1k

0

Entering edit mode

This code is straight from the minfi user guide. I tend to agree though, that it is as good as anything. You don't worry about color bias, treating type I/II probes separately or the SWAN method?

ADD REPLY • link 12.6 years ago by Neilfws 49k

0

Entering edit mode

Hello! I came across this old post while searching for methylation data analysis. I have data from control samples: one unmethylated and one methylated from both bisulfite sequencing and 450K. ( Ideally unmetylated control samples should have 0% methylation and methylated sample should have 100% but this is certainly not the case) I tried to correlate the results between 450K and sequencing, only including the sites that are present in both 450K and sequencing. I use the percentage of methylation ( beta value in 450K). I did not use any of the above package but got the data straight from Genome Studio.

I got a ~0.88 correlation for the unmethylated control sample, but only 0.07 for the methylated control. Any idea how this could be? Thanks in advance!

ADD REPLY • link 9.7 years ago by cafelumiere12 ▴ 80

score 1 · Answer 2 · 2014-01-06

I guess this is a somewhat old post, but I would recommend using COHCAP for 450k array analysis:

http://www.ncbi.nlm.nih.gov/pubmed/23598999

http://sourceforge.net/projects/cohcap/

I have used Genome Studio for processing / normalization, and I use COHCAP for QC, differential methylation (for CpG sites as well as CpG islands), and integration with gene expression data (if relevant).

Haven't yet tested the tools described in this link, but I would currently agree that minfi is probably OK for normalization. I think the additional normalization (e.g. SWAN, etc.) has a relatively modest effect. It seems to me that the the beta values used for analysis are less different than the raw intensity values, the relative frequencies of I vs. II are pretty different (see Figure 6 in the SWAN paper), and my personal opinion is that it is best to consider differentially methylated regions rather than individual probes / sites (for example, I think these relatively minor differences should be averaged out across the multiple probes within the CpG island).

Also, I think this comes with an assumption of using p-value / FDR alone or using a delta-beta value as a cutoff. I personally like using a methylated and unmethylated cutoff, so that I preferentially look at regions with methylation values that at least roughly follow the bimodal distribution in beta values, especially for cell line experiments. In other words, if you look for sites / regions where the average beta is > 0.7 in one group and <0.3 in the other group (or >0.3 in one group and <0.3 in the other group), it doesn't really matter as much if some of the probes for some of the unmethylated CpG sites show beta values closer to 0.2 than 0.3 (like in Figure 4C or Figure 6 of the SWAN paper)

SWAN Paper: http://genomebiology.com/2012/13/6/r44

score 0 · Answer 3 · 2012-08-31

Hi,

I also just started with working on illumina 450K data and just came across this post, which I see, was 5 months ago. Since I am totally new to this area right now, I am trying to figure out the best approach to analyse my 450K data with more than 700 samples. I did come across minfi, lumi, methylumi, and IMA, but I am not quite sure and I have similar questions which you mention here a few months back.

So I just thought of asking you now, as you might have already worked with quite a few things on that by now.

1) Which package did you use? Or did you use more than one, in combination? I am trying to get my hands on "minfi" right now, considering the recent paper about "SWAN" which seems to be one of the good approaches. But I want to know your experience with it and your suggestion.

2) There is another recent paper "http://www.ncbi.nlm.nih.gov/pubmed?term=complete%20pipeline%20for%20infinium%20human%20methylation" which talks about the complete preprocessing pipeline using an original SQN approach. This paper says that it performs both sample normalization and efficient infinium1/2 shift correction. Has anyone used this? If yes, how do you find it ?