Hi,
I was actually working with the DNA methylation data. I have the beta values to work with. I was doing a DMR(differentially methylated region) analysis.I was wondering whether any further filtering is required such as filtering the CpGs with low standard deviation or filtering the X chromosome. I was interested if someone could tell how exactly removing the last one would impact the analysis.
As per gene expression studies where both a P value (or adjusted P value) is used in conjunction with a fold-change difference (usually log base 2) in order to filter differentially expressed genes (DEGs), in methylation studies we look at both the P value and also the difference in means. The usual cut-off for difference in means is usually around |0.15| to |0.20| when considering β values. That is, calculate the mean of the probe in one group and then subtract it from the mean in the other group.
Things such as standard deviation and X chromosome markers would, I imagine, have already been dealt with in the pre-processing steps and normalisation steps. If markers do have low standard deviation, then they most likely would not show up as significantly differently methylated, but a low standard deviation in itself doesn't necessarily indicate that there's a problem with the probe.
Which type of data is it that you're analysing - TCGA data?
Thanks for the reply Kevin. Yes I am working with TCGA data with beta values. But looking at X chromosome related probes, in the density plots I could see clear differences between male and female(expected also) so I thought better to filter them since I was interested in looking at difference in disease condition between overall population.
Also another question of you could answer. When we get DMRs how to qunatify them in terms of any beta values. I am using minfy package. I want to use them for building a classifier and as the output of bumpfinder have get areas and pvalues for regions. We dont have any beta values as such which is not feasible for test data for which we dont have area but only the beta values.
Your point on the X chromosome markers are valid and, unless, hey have been specifically dealt with, it would be best to remove them. I just realised that I've only worked on predominantly or exclusively female cancers from the TCGA, so, I've never had to consider chrX that much. In other data-types, they certainly do just remove them though.
I cannot comment on bumpfinder or minfy as I have never used them. In my work, we only found 1 really great methylated gene of interest; thus, we didn't have to do anything further. Eventually you'll also reach the point where you just have to take what you've got and try to make sense out of it and each a conclusion. If you have way too many that are statistically significantly different, then obviously that's difficult right now.
Thanks for the reply Kevin. Yes I am working with TCGA data with beta values. But looking at X chromosome related probes, in the density plots I could see clear differences between male and female(expected also) so I thought better to filter them since I was interested in looking at difference in disease condition between overall population.
Also another question of you could answer. When we get DMRs how to qunatify them in terms of any beta values. I am using minfy package. I want to use them for building a classifier and as the output of bumpfinder have get areas and pvalues for regions. We dont have any beta values as such which is not feasible for test data for which we dont have area but only the beta values.
Your point on the X chromosome markers are valid and, unless, hey have been specifically dealt with, it would be best to remove them. I just realised that I've only worked on predominantly or exclusively female cancers from the TCGA, so, I've never had to consider chrX that much. In other data-types, they certainly do just remove them though.
I cannot comment on bumpfinder or minfy as I have never used them. In my work, we only found 1 really great methylated gene of interest; thus, we didn't have to do anything further. Eventually you'll also reach the point where you just have to take what you've got and try to make sense out of it and each a conclusion. If you have way too many that are statistically significantly different, then obviously that's difficult right now.