Hi,
I'm taking a look to an EWAS file, obtained from an Infinium human methylation 450k beadchip. In the documentation that Illumina offers, every cg_id is mapped to a number of genes (from zero to several). Moreover, information about where in the gene is the cg_id located is also provided.
For instance,
- cg01207734 is mapped to two genes (MSH3;DHFR) in different regiones (Body;TSS200)
- cg01220655 is mapped twice to the same gene (FLT4;FLT4) at the same location (Body;Body)
Why is that? I mean, why would a cg_island affect "twice" to the same gene?
Thanks a lot!!
Hi Charles, i'm processing a data from 450K beadchip to identify differentially methylated probes(DMPs) between our case and control. And i get more than 10000 DMPs. When i'm calculating the distribution of DMPs in relation to genes(such as: What percentage of DMPs located in TSS200 region), i face the same question: one probe annotated with more than one relation to the same gene.
For example : ID: cg09132215 Gene: TAS1R1;TAS1R1;TAS1R1;NOL9;TAS1R1 UCSC_RefGene_Group: TSS200;TSS200;TSS200;TSS1500;TSS200
What should i do about the location for cg09132215? Should i count as $num_TSS200=$num_TSS200+4 and $num_TSS1500=$num_TSS1500+1? ($num_TSS200: the number of DMPs located at TSS200; $num_TSS1500: the number of DMPs located at TSS1500)
Any suggestion will be great appreciated!