considerations for epiallele studies (DNA methylation Bisulfite Sequencing)
3
3
Entering edit mode
8.2 years ago

I am doing a study of bisulfite sequencing data from a bunch of samples, and I'm interested in looking into epialleles, epipolymorphisms, and measures of DNA methylation disorder.

One thing I keep seeing in papers is that every study seems to limit their analysis to only epialleles with 4CpGs, even if they have sequences with > 4 CpGs they trim them to 4CpGs.

I was wondering why they don't analyze longer epialleles if they have the capability? I understand that to get a true measure of the epiallele composition of a locus with >4 CpGs you would need higher sequencing coverage due to there being a higher number of possible epialleles. But if you see a locus of 6CpGs, (and therefore 2^6 = 128 possible epialleles) that consistently has only 2 epialleles (say CCCCCT and CCCCCC) each covered at 30x, isn't it reasonable to say that this sequence is tightly regulated, because even though there are vastly more epialleles possible, you only see 2/128 ?

I guess I am just wondering why studies of methylation disorder (methylation heterogeneity) limit their analyses to only 4CpG blocks, when they could calculate similar statistics for longer epialleles. ANy insights on this topic would be helpful because I have found literally zero explanation in the papers I've read.

bisulfite sequencing dna methylation epiallele • 2.3k views
ADD COMMENT
1
Entering edit mode
8.2 years ago
Shicheng Guo ★ 9.5k

As you have mentioned, it is difficult to find the 6 continuous CpGs, since the NGS sequencing length usually is 150bp, even that it is pair-end, the length is less than 300bp, and sometime there are overlap between these two reads. If you only counts the regions which have > 5 CpGs in the human genome with 300bp windows, only a small genomic regions will be included and therefore you can not get some solid conclusion genome-widely.

You can validate your hypothesis in 3-rd generation methylation sequencing, right. And I hope you can provided Kb or Mb methylation alleles, however, it seems the accuracy for 3-rd generation BS-seq is not good enough.

As to why they trim the length to 4, it mainly because, the statistic and relevant metric require same length to compare different regions and samples. Suppose the length in different regions or samples are not same, it is hard to make the inference. Also as you mentioned, there are 2^4 case for 4 CpGs and if you observe CCCC or TTTT, it is more likely not stochastic.

ADD COMMENT
0
Entering edit mode

I understand that it is difficult to find regions with > 4 contiguous CpGs, but I don't understand why they specifically exclude regions that DO have > 4 contiguous CpGs. I think it may have something to do with the statistic calculations having different biases for differnt length epialleles, but I Haven't quite figured it out yet. Do you have any ideas on why they intentionally trim longer alleles to 4 CpG ones?

ADD REPLY
0
Entering edit mode
8.0 years ago

HI, I have 2 jobs posted recently about the epialleles, In these papers i analyze the epialleles of a specific gene, in particular six CpGs. You can find my paper at: https://www.ncbi.nlm.nih.gov/pubmed/27858532 https://www.ncbi.nlm.nih.gov/pubmed/27884103.

Let me know if you have been helpful

Best

ADD COMMENT
0
Entering edit mode
8.0 years ago
rkostadi ▴ 60

Who is they? Can you provide a reference? There's no reason to exclude longer CpG methylation haplotypes from the analyses. I consider 5- and 6- long ones too. Average CpG to neighboring CpG distance is about 100bp, median is about 40bp for the human genome, that's why with paired end sequencing you tend to capture 4- cpg haps better. Also, you can have overlapping epi haps. Either way, the aim is usually to detect sites that change frequency, and as you point out a 6- long hap has more power to detect frequency change.

ADD COMMENT

Login before adding your answer.

Traffic: 2076 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6