Question

Why Is A Cg Island Mapped Twice To The Same Gene?

0

Entering edit mode

10.9 years ago

dolores ▴ 20

Hi,

I'm taking a look to an EWAS file, obtained from an Infinium human methylation 450k beadchip. In the documentation that Illumina offers, every cg_id is mapped to a number of genes (from zero to several). Moreover, information about where in the gene is the cg_id located is also provided.

For instance,

cg01207734 is mapped to two genes (MSH3;DHFR) in different regiones (Body;TSS200)
cg01220655 is mapped twice to the same gene (FLT4;FLT4) at the same location (Body;Body)

Why is that? I mean, why would a cg_island affect "twice" to the same gene?

Thanks a lot!!

mapping • 2.0k views

ADD COMMENT • link updated 10.9 years ago by Charles Warden 8.3k • written 10.9 years ago by dolores ▴ 20

score 0 · Answer 1 · 2013-11-28

0

Entering edit mode

10.9 years ago

Charles Warden 8.3k

I wouldn't worry about it - I believe the RefSeq transcript ID is technically different (listed in a different column). There are actually a lot of annotations like that -I've seen a gene name duplicated upwards of 9 times for a particular probe for the 450k array.

Regardless of what the annotation map says, you can always check the interpretation yourself by viewing the genome coordinates for the CpG island (using the UCSC Genome Browser, IGV, etc.)

ADD COMMENT • link 10.9 years ago by Charles Warden 8.3k

0

Entering edit mode

Hi Charles, i'm processing a data from 450K beadchip to identify differentially methylated probes(DMPs) between our case and control. And i get more than 10000 DMPs. When i'm calculating the distribution of DMPs in relation to genes(such as: What percentage of DMPs located in TSS200 region), i face the same question: one probe annotated with more than one relation to the same gene.

For example : ID: cg09132215 Gene: TAS1R1;TAS1R1;TAS1R1;NOL9;TAS1R1 UCSC_RefGene_Group: TSS200;TSS200;TSS200;TSS1500;TSS200

What should i do about the location for cg09132215? Should i count as $num_TSS200=$num_TSS200+4 and $num_TSS1500=$num_TSS1500+1? ($num_TSS200: the number of DMPs located at TSS200; $num_TSS1500: the number of DMPs located at TSS1500)

Any suggestion will be great appreciated!

ADD REPLY • link 7.9 years ago by RC ▴ 20