Hi, i'm processing a data from 450K beadchip to identify differentially methylated probes(DMPs) between our case and control. And i get more than 10000 DMPs. When i'm calculating the distribution of DMPs in relation to genes(such as: What percentage of DMPs located in TSS200 region), i face a question: one probe annotated with more than one region to the same gene, sometimes the regions are the same.
For example :
ID: cg09132215
Gene: TAS1R1;TAS1R1;TAS1R1;NOL9;TAS1R1
UCSC_RefGene_Group: TSS200;TSS200;TSS200;TSS1500;TSS200
What should i do about the location for cg09132215? Should i count as :$num_TSS200=$num_TSS200+4 and $num_TSS1500=$num_TSS1500+1? ($num_TSS200: the number of DMPs located at TSS200; $num_TSS1500: the number of DMPs located at TSS1500)
Any suggestion will be great appreciated!
But i still confused about the method to count the distribution of DMPs. if i count as $num_TSS200=$num_TSS200+4 and $num_TSS1500=$num_TSS1500+1, it seems like i count this DMP as 5 probes. Should i took the first annotation as standard annotation (TSS200 in TAS1R1) for cg 09132215 and count as $num_TSS200=$num_TSS200+1?
What method did you end up using to account for these multiple locations?