How methylation is annotated, and what is the number behind the cg.... annotation
1
0
Entering edit mode
3 days ago
Peter • 0

Hi everyone, I come from a computer science background, so my knowledge of genomics is limited. I’m planning to develop an algorithm that can be applied to methylation data, but my questions might be quite basic. I would appreciate your help in understanding this correctly.

Recently, I looked into methylation datasets and noticed that sites are typically annotated with the prefix 'cg' followed by a string of digits, like 'cg04303809'. I understand that 'cg' refers to methylation at CpG sites in DNA sequences, but I’m unsure about the meaning of the digit sequences that follow. Do they represent positions in the reference genome? Additionally, I’m curious if there is any sequential relationship between these methylation annotations. For example, does the methylation status at one CpG site affect the methylation status at another CpG site elsewhere? If such relationships exist, could you please provide any references or resources that discuss this? Thank you very much

Methylation annotation • 221 views
ADD COMMENT
0
Entering edit mode
3 days ago
Ming Tommy Tang ★ 4.4k

Where did you download the data? I think the number is just an index for the CpG site. There is a thing called CpG island with multiple CpG sites in some of the promoters. I do not know how CpG sites in different genomic regions interact with each other. But it could be a long-range genomic interaction with CpG sites in those regions that are both unmethylated or methylated.

ADD COMMENT
0
Entering edit mode

Thank you for your reply. If the number is an index for the CpG sites, within a sequence, do you think they are continuous values and has an order? For example, we have cg02494853, cg01707559 and cg04016144, does the cg01707559 really locate in between cg02494853 and cg04016144? In addition, I used dataset from Kaggle, according to the description, this data is a part of a challenge. Below is the link to it. https://www.kaggle.com/datasets/marquis03/age-assessment-and-disease-risk-prediction?select=ai4bio_trainset

ADD REPLY

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6