Entering edit mode
5.9 years ago
janhuang.cn
▴
230
I used the R package IlluminaHumanMethylationEPICanno.ilm10b4.hg19 to extract the annotation of the CpGs in my data. I wonder if there is any documentation on how Illumina define these annotations? Is it solely based on the distance from the loci?
Thank you very much.
Thanks. The column heading says "UCSC_RefGene_Name: Target gene name(s), from the UCSC database. *Note: multiple listings of the same gene name indicate splice variants". But is there any information on how they (UCSC or Illumina) annotate it? For example, cg18478105 (chr20:61847650) was on YTHDF1 (UCSC_RefGene_Name). While UCSC shows that YTHDF1 is on chr20:61826782-61847538. cg18478105 does no fall in this range. Is there a criterium of say +/- 500bp or 1000bp?
I believe the annotation type is separated by a semicolon (at least it is within the CSV annotation file).
There are TSS200 and TSS1500 annotations, so my guess is that you may be interested in those. You might also want to look at the 5` UTR and 1st exon.