Reporting methylation of both Cs and Gs, or only Cs
1
3
Entering edit mode
5.3 years ago
kashiff007 ★ 1.9k

Hi All,

I just started working with methylation (WGBS) data. I have used Bismark and it generated the methylation of Cs and consecutive Gs. I am guessing this Gs methylation is from Cs from other strand (please correct me if I am wrong). My question is should I consider both while reporting the methylation in a region or should I filter only Cs rows and perform the further analysis?

Thanks.

Methylation bismark BS-seq next-gen sequencing • 1.3k views
ADD COMMENT
2
Entering edit mode
5.3 years ago

As you surmised, the "G" is actually the C on the - strand. The methylation of a region should include both strands, unless you know that whatever you're studying is only affected by single-stranded methylation. Note, however that the Cs in a CpG (the C and subsequent G in bismark's files) typically show roughly symmetric/equal methylation ratios, so a common strategy is to simply combine them into CpG-level methylation levels. Bismark likely has a method for doing that (if not, you can do it with MethylDackel).

ADD COMMENT
1
Entering edit mode

Thank for your clear answer. I want to know why I have to consider both methylation level (from C and subsequent G) for the analysis. Let's say I want look at the methylation level inside the promoter of a gene, since genes are defined in the strand specific does it not make sense to take only Cs (from all CpGs) from the promoter. Similarly, if the gene is present in negative strand one has to take Gs methylation level which is actually Cs from negative strand.

I am assuming to take both (Cs and subsequent Gs) methylation if I am looking at the methylation level present inside a peak (eg H3K9ac). Because these peaks has modified histones wrapped with both the stands of DNA and hence presence/occupancy of these peaks are affected by both strands.

ADD REPLY
0
Entering edit mode

Transcription factors and other DNA-interacting proteins don't typically interact with a single strand, but rather with major or minor groove of double stranded DNA. It's rather unusual for genes to only be affected by methylation on the same strand as them (not to mention that it's the reverse strand the serves as the template). Further, your coverage for a CpG will be double that of a single C, which greatly aids in statistical power.

ADD REPLY

Login before adding your answer.

Traffic: 2217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6