DNA methylation (RRBS/WGBS) prediction from DNA sequence (WGS)
2
1
Entering edit mode
21 months ago
Lalaland ▴ 40

I have read several published papers on DNA methylation prediction based on methylation patterns. However, I would like to ask if there are any existing methods for prediction on DNA methylation levels (beta values) and/or status (0 or 1) based on whole-genome sequencing data (WGS)?

I have 1,000 individuals' WGS data and DNA methylation data for only 500 out of the same1,000 individuals.

I wish to train a prediction model on 500 individuals with both WGS and DNA methylation data, and test/predict for other 500 individuals which without DNA methylation data.

Any helps are truly appreciated!

methylation Prediction DNA WGS WGBS Imputation • 1.5k views
ADD COMMENT
2
Entering edit mode
21 months ago

I haven't been following the field any more in the past years, but I think you would need to include more predictors than the sequence alone for this model to produce reliable output.

Firstly, the genomic context (cis-regulatory element, promoter, gene body, intergenic etc.) and cell type surely matters, so your model would need to be trained on relatively long continuous sequences and unaligned short reads would likely not suffice. Additionally, the model needs to consider the state of the cell (mitotic age and potentially senescence, cancer etc.) or will be limited to predict methylation for one kind of population. Lastly, there have been long debates about the fraction of "functional" methylation in the cell, with estimates as low as 20%. So expect a lot of randomness and noise.

Two papers you should have a look at are this review from on the dynamics of methylation during mammalian development and how a complex interplay of transcription factors shapes the methylome. If you're masochistically inclined and can stand long, dreary monologues, you can additionaly have a look at Chapter 12, sections 2 and 3 of my thesis (PDF download), where I discuss all kinds of methylation features and their causes known up till 2019. This was meant to cover up the sad truth of me failing to discover something noteworthy in a crappy 3 vs. 3 sample WGBS dataset despite wasting 8 years of my life with it.

ADD COMMENT
0
Entering edit mode

Update: I guess this preprint is highly relevant for your undertaking: MuLan-Methyl - Multiple Transformer-based Language Models for Accurate DNA Methylation Prediction. Unfortunately, the web app is currently under maintenance.

ADD REPLY
1
Entering edit mode

Thank you very much, Matthias! I will definitely check those links out myself!

ADD REPLY
1
Entering edit mode
21 months ago
Nana ▴ 30

Not my area, but there's a random forest classifier that mixes genomic and methylation features, including SNPs (Zhang et al., 2015. Genome Biology).

And DeepCpG was trained for imputation on scRNA, but they describe a module that predicts methylation from DNA sequence alone (AngerMeuller et al., 2017. Genome Biology). From the repo, the main model input seems to be sparse CpG methylation for imputation. Maybe you could adapt it for your purpose.

As an aside, I would be surprised if complete methylation imputation from DNA alone is reliable enough for most downstream questions.

ADD COMMENT
1
Entering edit mode

Nana, thanks very much for your suggestions and references! I plan to try the random forest classifier by Zhang 2015 first, since it could predict both methylation levels (between 0 and 1) and status (either 0 or 1).

ADD REPLY

Login before adding your answer.

Traffic: 1985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6