Question

Differential DNA methylation for each sample using Illumina HumanMethylation450 BeadChip dataset

0

Entering edit mode

5.2 years ago

geocarvalho ▴ 400

Hello, I started to study methylation DNA recently and I used the GSE72245 dataset. Using limma package I have the matrix with the numbers of hyper-methylated (1) and hypo-methylated (-1) as result from lmFit, eBayes. decideTests functions, as shown:

CpG Basal HER2 LumA LumB

cg13869341 1 1 1 1

cg01014490 -1 -1 -1 -1

cg17505339 1 1 1 1

I would like to know if there is a possibility to have the same matrix with hyper-methylated (1), hypo-methylated (-1) and no-signal (0) for each sample instead of the groups (Basal, HER2, LumA, LumB), more like:

CpG GSM1858429 GSM1858430 GSM1858431 GSM1858432 ...

cg13869341 1 1 1 1 ...

cg01014490 -1 -1 -1 -1 ...

cg17505339 1 1 1 1 ...

Sorry if the question is trivial.

R methylation humanmethylation450 limma • 2.5k views

ADD COMMENT • link updated 5.2 years ago by Charles Warden 8.3k • written 5.2 years ago by geocarvalho ▴ 400

1

Entering edit mode

To do that, you probably just need to set a cut--off for hyper- and hypo-methylated. For example:

Beta > 0.8 = hyper--methylated
beta < 0.2 = hypo-methylated

Kevin

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k

0

Entering edit mode

Could I consider what is between those values as no-signal?

ADD REPLY • link 5.2 years ago by geocarvalho ▴ 400

0

Entering edit mode

Well, the values below 0.2 would be no signal (no methylation). Anything between 0.2 and 0.8 would be normal methylation.

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k

1

Entering edit mode

I am sorry, but I am a little but confused:

1) The beta values should be between 0 and 1 (matching what Kevin says, but I am not sure how you are getting 1 and -1 values for each probe in the original question)

2) COHCAP makes use of some methylated thresholds, but I think it is more like "hypo-methylated" is homozygous unmethylated and "hyper-methylated" is homozygous methylated.

For example, in promoters, I would expect more unmethyatled CpG sites (if ~70% of protein-coding genes are expressed), so "normal" would probably be more like "hypo-methyated".

I see .idat files in the GEO submission, so you use take minfi or GenomeStudio to define some new beta values

3) There are detection p-values to define if the probe was detected. Imposing more stringent detection p-value cutoff is one reason why you might want to use GenomeStudio. However, beta values below 0.2 do not correspond to low absolute signal from either channel. That is what you would want to use the detection p-values for.

ADD REPLY • link 5.2 years ago by Charles Warden 8.3k

1

Entering edit mode

Thanks for your input Charles

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k

0

Entering edit mode

Thanks, Charles. Some clarifications about what I did: 1) I used .idat as input for minfi package and its beta-values as input for differential methylation using limma package, and that's the output after using lmFit, eBayes and decideTests functions from limma.

Using the beta-values there is a threshold or a program that gives me a matrix with each patient and CpGs information as hyper-methylated, hypo-methylated, or no methylated, it's possible?

ADD REPLY • link 5.2 years ago by geocarvalho ▴ 400

1

Entering edit mode

I think something doesn't seem right, but it is hard for me to say for certain.

For the site test, I believe minfi is using a logit transform (or some sort of transform to make the data look more normally distributed) and limma. While this has some advantages, you can also miss some sites that have real differences if you looked at the 0-to-1 scale (although I think the proportion of sites where this is a big issue is small).

Strictly speaking, COHCAP can work with 1 sample, but I would not say that is the preferable way to apply the method. Also, that is for regions, rather than sites.

However, I think that the .wig files that generates (or that I believe programs like RnBeads can generate) is helpful in reviewing your data. For example, if you use minfi (or limma), that may be a good way to confirm there is a noticeable difference in the original beta values (while taking spatial distance and gene annotations into consideration, using a program like IGV).

ADD REPLY • link 5.2 years ago by Charles Warden 8.3k

score 2 · Answer 1 · 2020-06-09

I partially mention this in my other comment, but these are some programs that you can use for differential methylation of 450k (and probably EPIC) Illlumina Methylation array data:

minfi/bumphunter - 450k and EPIC

RnBeads - 450k and EPIC

COHCAP - 450k (and EPIC, if you use a custom annotation)

IMA - 450k only (as far as I know)

I thought I remembered seeing a review recently, but I don't remember the exact paper (and my Disqus feed is currently having some issue loading my comments, and I am not 100% certain if I commented on the paper that I am thinking about).

Also, please start with the .idat files. Even though minfi may use a transformation for the site test (based upon limma), the normalized methylation should be beta values between 0 an 1.