I would like to know your opinion on finding methylation from a data set of HumanMethylation450. I have read few papers and R codes. I have a data with only one replicate (healthy against disease)
Is there any way to find whether methylation occurred or not? what threshold normally one should use to see if it is methylated or not (I mean the same prob)
@Kevin Blighe thanks for your explanation. what do you mean by beta? if we have a prob, we have the expression for that do you mean I perform a linear relation between disease versus normal and get beta+xalpha ? can you please explain it a bit more thanks
No, the Beta value is a measure of the level of methylation at the probe site. Take a look here: Interpretation of Beta values : Methylation data
I am aware that 'beta' is also used in regression modeling, but I am not referring to that here.
Which values have you got? I would not use the term 'expression' for methylation, as it does not make sense. Methylation at a genomic locus actually represses transcription/expression at the locus in question.
@Kevin Blighe do you have any reference for the cut-off ? Also what if I want to see the methylation in general ? for example on disease probs and on healthy probs , is there any way that I see the differences ?
Hey, they allude to 0.15 as a threshold HERE. You'll see references to it scattered throughout the literature. I guess that it's a bit like 5% FDR and log2 fold-change of 2 in expression studies, which are what people generally go for but that are by no means fixed thresholds.
Yes, you can also of course just plot out the Beta values and visually compare. I did this once and 'randomly' selected 0.6 as a cut-off for high/low methylation.
Hey, I wanted to get back to you on this because I had been reading an interesting manuscript about Beta and M values. The authors concluded that they both exhibit a linear relationship in the range Beta 0.2-0.8, but that, outside of this range, M values are better for parametric statistics due to the fact that their distribution (across the 27k chip and their study samples) was more homoskedastic and therefore more in line with the assumptions of simple statistical tests, like the t-test.
However, in my recommendations to you here, I mentioned the use of a Wilcoxon Signed Rank test, which is non-parametric and therefore justified.
For more, see here: https://www.ncbi.nlm.nih.gov/pubmed/21118553