I recently analysed the methylation of some patient samples using the Illumina EPIC methylation microarray. After doing the DMR analysis I got alot of pseudogenes and miRNA in my list. Some of these were significantly hyper-/hypo- methylated.
I guess my question is are these pseudogenes useful or can I just delete them from my list? Have any of you come across this problem and what did you guys do?
It depends on what it is you are trying to measure. If you're trying to examine the correlation between methylation and expression, pseudogenes don't sound useful, since they should not be expressed. What is the goal of your analysis?
Oh, I thought the definition of a pseudogene was something like - "something that looks like a gene, but is either nonfunctional or not expressed." If not, then what is a pseudogene?
AFAIK they can be expressed (and even translated) without being functional, and they can be misclassified earlier as non-functional but now turning out to be functional. So in the latter case you could argue that the definition is correct and the annotation is wrong.
With regard to the OPs question, I would suggest that if regions are significantly differentially methylated that this is interesting regardless of which element (protein coding gene/pseudogene/miRNA/...) is present here. It's biologically not correct to exclude randomly.
I'm not fully aware of the odds of false positives using this array-based methylation characterization... And furthermore, it indeed depends on what OP wants to do with the results.
Pathway enrichment is such a boring thing to do with a gene list and should never be the endpoint of your analysis. Intuitively, pseudogenes might not contribute to those pathways (because pseudogenes won't be classified) and therefore it will also not hurt your analysis...
Thanks @WouterDeCoster for your advice really appreciate it... I'm really new to this work so have to say quiet ignorant on things that can be done with this data.
You mentioned that pathway analysis shouldn't be only thing to do with a gene list. Can you suggest what else I can do with this data? Are there any papers that I can refer too that have shown further analysis with gene lists?
Well, I wrote it shouldn't be an endpoint of your analysis, but the next step isn't obvious. Your pathway enrichment will yield some affected pathways, and quite often those are a) not informative or b) already known to be involved. Somehow you should get this result back to the biology/wet lab, to determine/validate the involvement of that pathway.
But most likely you started this experiment with a biological hypothesis?
What I want to do is look at what genes are differentially methylated, do heatmaps on them to show difference in methylation across the treatment groups and then do some pathway enrichment analysis on them. So I guess if I'm going to look at pathways, pseudogenes are useless as they shouldnt lead to anything... but because these pseudogenes showed really high beta value (or really low value) I was just wondering if they were significant and whether I should include them in the heatmaps?
Some pseudogenes are functional and expressed...
Oh, I thought the definition of a pseudogene was something like - "something that looks like a gene, but is either nonfunctional or not expressed." If not, then what is a pseudogene?
AFAIK they can be expressed (and even translated) without being functional, and they can be misclassified earlier as non-functional but now turning out to be functional. So in the latter case you could argue that the definition is correct and the annotation is wrong.
There is some literature about this:
Thanks for those links; I will be better-educated in the future!
With regard to the OPs question, I would suggest that if regions are significantly differentially methylated that this is interesting regardless of which element (protein coding gene/pseudogene/miRNA/...) is present here. It's biologically not correct to exclude randomly.
I'm not fully aware of the odds of false positives using this array-based methylation characterization... And furthermore, it indeed depends on what OP wants to do with the results.
Pathway enrichment is such a boring thing to do with a gene list and should never be the endpoint of your analysis. Intuitively, pseudogenes might not contribute to those pathways (because pseudogenes won't be classified) and therefore it will also not hurt your analysis...
Thanks @WouterDeCoster for your advice really appreciate it... I'm really new to this work so have to say quiet ignorant on things that can be done with this data.
You mentioned that pathway analysis shouldn't be only thing to do with a gene list. Can you suggest what else I can do with this data? Are there any papers that I can refer too that have shown further analysis with gene lists?
Thanks, R
Well, I wrote it shouldn't be an endpoint of your analysis, but the next step isn't obvious. Your pathway enrichment will yield some affected pathways, and quite often those are a) not informative or b) already known to be involved. Somehow you should get this result back to the biology/wet lab, to determine/validate the involvement of that pathway.
But most likely you started this experiment with a biological hypothesis?