Entering edit mode
2 days ago
jain72744
▴
10
Hello, I have extracted the cpg probes for methylation data from TCGA and the corresponding annotation file hg19 that looks like this
cg00050873
chrY
9363356
-
cg00050873
32735311
31717405
ACAAAAAAACAACACACAACTATAATAATTTTTAAAATAAATAAACCCCA
ACGAAAAAACAACGCACAACTATAATAATTTTTAAAATAAATAAACCCCG
I
A
Red
chrY:9363680-9363943
N_Shore
TATCTCTGTCTGGCGAGGAGGCAACGCACAACTGTGGTGGTTTTTGGAGTGGGTGGACCC[CG]GCCAAGACGGCCTGGGCTGACCAGAGACGGGAGGCAGAAAAAGTGGGCAGGTGGTTGCAG
CGGGGTCCACCCACTCCAAAAACCACCACAGTTGTGCGTTGCCTCCTCGC
TSPY4;FAM197Y2
NM_001164471;NR_001553
Body;TSS1500
Y:9973136-9976273
But for some cpg probes, the gene names are missing. How to add those for functional enrichment?
Not all cpgs are within or proximal to genes.
So how to compare their differential expression to transcriptomics data
Methylation array data (450K/EPIC etc) has existing annotation from Illumina including the position of each probe relative to its nearest gene. The probe can be in the gene body, or upstream of the TSS, or in an island, shore, shelf etc. The annotation isn't always straightforward since TSS can overlap or a probe can be positionally associated with multiple genes, and other times a probe can be quite far from any gene. Probes on the Illumina arrays are chosen for many reasons and some of them are functionally relevant rather than based on positional relationships with genes.
You can access the Illumina annotation or you can annotate the genes yourself. For the purpose of differential expression comparisons you might want to think about differentially methylated regions rather than individual probes