Hi,
I'm a little bit confused on how to interpret the methylation data provided by TCGA.
I'm trying to get a general idea of the methylation state of the gene promoter for the gene MGMT. Where I'm running into difficulty is aggregating the data for multiple reported elements for a single gene.
For example, each level three file has the following column headers of:
Composite Element REF
Beta_value
Chromosome
Start
End
Gene_Symbol
Gene_Type
Transcript_ID
Position_to_TSS
CGI_Coordinate
Feature_Type
I can easily filter out target methylation site by grepping for the gene symbol.
grep MGMT level-3-methyl-data.txt > hits.txt;
When I review the data, it reports the methylation status for single base pairs, which may or may not be within the associated genes promoter region. Should I, and if reasonable, how can I integrate this data into a more comprehensive view of methylation on the MGMT promoter? I'm not sure if this is even a reasonable idea, so I'm thinking about this the wrong way, please just let me know.
Hi, I have read the paper you shared above but I still don't know which probe id can represent MGMT methylation status considering MGMT has 100+ probe id. I filter promoter region of MGMT (Start and End column) by NCBI genebank and still have 16 probes. Should I use cg12434587 and cg12981137 probe(the paper you shared mentioned) as MGMTmethylation? Also, should I use average of cg12434587 and cg12981137? Thanks