Hello, I started to study methylation DNA recently and I used the GSE72245 dataset. Using limma package I have the matrix with the numbers of hyper-methylated (1) and hypo-methylated (-1) as result from lmFit
, eBayes
. decideTests
functions, as shown:
CpG Basal HER2 LumA LumB
cg13869341 1 1 1 1
cg01014490 -1 -1 -1 -1
cg17505339 1 1 1 1
I would like to know if there is a possibility to have the same matrix with hyper-methylated (1), hypo-methylated (-1) and no-signal (0) for each sample instead of the groups (Basal, HER2, LumA, LumB), more like:
CpG GSM1858429 GSM1858430 GSM1858431 GSM1858432 ...
cg13869341 1 1 1 1 ...
cg01014490 -1 -1 -1 -1 ...
cg17505339 1 1 1 1 ...
Sorry if the question is trivial.
To do that, you probably just need to set a cut--off for hyper- and hypo-methylated. For example:
Kevin
Could I consider what is between those values as no-signal?
Well, the values below 0.2 would be no signal (no methylation). Anything between 0.2 and 0.8 would be normal methylation.
I am sorry, but I am a little but confused:
1) The beta values should be between 0 and 1 (matching what Kevin says, but I am not sure how you are getting 1 and -1 values for each probe in the original question)
2) COHCAP makes use of some methylated thresholds, but I think it is more like "hypo-methylated" is homozygous unmethylated and "hyper-methylated" is homozygous methylated.
For example, in promoters, I would expect more unmethyatled CpG sites (if ~70% of protein-coding genes are expressed), so "normal" would probably be more like "hypo-methyated".
I see .idat files in the GEO submission, so you use take minfi or GenomeStudio to define some new beta values
3) There are detection p-values to define if the probe was detected. Imposing more stringent detection p-value cutoff is one reason why you might want to use GenomeStudio. However, beta values below 0.2 do not correspond to low absolute signal from either channel. That is what you would want to use the detection p-values for.
Thanks for your input Charles
Thanks, Charles. Some clarifications about what I did: 1) I used .idat as input for minfi package and its beta-values as input for differential methylation using limma package, and that's the output after using
lmFit
,eBayes
anddecideTests
functions from limma.Using the beta-values there is a threshold or a program that gives me a matrix with each patient and CpGs information as hyper-methylated, hypo-methylated, or no methylated, it's possible?
I think something doesn't seem right, but it is hard for me to say for certain.
For the site test, I believe minfi is using a logit transform (or some sort of transform to make the data look more normally distributed) and limma. While this has some advantages, you can also miss some sites that have real differences if you looked at the 0-to-1 scale (although I think the proportion of sites where this is a big issue is small).
Strictly speaking, COHCAP can work with 1 sample, but I would not say that is the preferable way to apply the method. Also, that is for regions, rather than sites.
However, I think that the .wig files that generates (or that I believe programs like RnBeads can generate) is helpful in reviewing your data. For example, if you use minfi (or limma), that may be a good way to confirm there is a noticeable difference in the original beta values (while taking spatial distance and gene annotations into consideration, using a program like IGV).