Hi, I am working with some DNA methylation data and have a few questions about gene annotation.
1) After preprocessing and quality control, we have a final data set consisting of 760 500 probes, not 850 000. How do I find the total number of genes in EPIC array after this preprocessing/filtering?
2) If I have a list of genes of interest, how do I find out if these are covered by the EPIC array (i.e. included in EPIC array)?
3) If I have a list of genes, how do I find the total number of probes annotated to them?
The reason I am asking about this is that I want to perform a Fisher exact test/ or Chi square. If I want to test if the number of differentially methylated CpGs annotated to genes associated with for example cancer is higher than expected than chance, is it correct to use the number of CpGs/probes or the number of genes?
Let me add that this is really not my field and I have only had an introduction course to R so far. Very grateful for any good advices and tips!
Thank you so much for taking your time to answer. We have the annotation for all significant DMPs, however, how to find the total number of genes?
I think the total number of genes can be easily attained by just taking the count of the unique genes present in the "UCSC_RefGene_Name" column from the manifest file. The package dplyr in R will be helpful for doing this. You may have to deal with the commas separating the gene names for each CpG loci.