I am looking into the TCGA Methylation data and I wanted to understand how to parse the data, and, ideally, map measured beta values to single Hugo symbols.
My issues are as follows:
1) For some of the Stable Entity IDs there are multiple gene names listed, for example in the breast cancer (BRCA) data there is a row with values:
Stable Entity ID | Name | Description | Transcript ID
"cg00008493 | KIAA1409;COX8C | Body;5'UTR | NM_020818;NM_182971 |
2) Many Stable Entity IDs map to the same gene, for example, in the attached image, multiple Stable Entity IDs map to the same gene (DLX5)
For a research project I'd love to associate each gene to a specific methylation value. Put differently, for each patient I want to create a vector where each entry corresponds to a methylation value for a given gene. Is there a principled way to do this?