Entering edit mode
5.0 years ago
saad.zaheer531
▴
10
Consider the following two rows of a methylation data set :
**Composite Element REF | Beta_value | Chromosome | Start | End | Gene_Symbol | Gene_Type | Transcript_ID | Position_to_TSS | CGI_Coordinate | Feature_Type**
cg06743060 | 0.992868 | chr21 | 45746710| 45746711| PCBP3 | protein_coding | ENST00000400314.4;ENST00000465077.4;ENST000005... | -1314;103015;11317;13984;73154 | CGI:chr21:45745872-45747012 | Island
cg27198114 | 0.992101 | chr12 | 132555544|132555545|FBRSL1 | protein_coding|ENST00000434748.2 | 65992 |CGI:chr12:132555310-132555571 |Island
I want to know what the beta_value tells me here. Should I interpret it as the likelihood of that particular spot in the gene to have been methylated?
Assuming I have a data set of 500,000 rows with beta values ranging from 0.00001-0.9999 and I want to discard the data with beta_value < 0.5 because the gene in those rows might not have been methylated, will I be doing the right thing?
Would you suggest something else?
It was always difficult for me to interpret beta values from arrays, but when you work with NGS bisulphite sequencing data, it may be interpreted as "0.992868 of cells are methylated in this position". There is always a question about the bisulphite conversion efficiency, but I was always omitting this (once your sample passes initial QC control).
Absence of methylation is also an information about the epigenomic modifications. Methylation of levels different from 0 or 1 is also an information - sometimes researchers are looking directly for it, called "Partially Methylated Domains". I would not discard it.