Question

Genome-wide enrichment of histone marks and correlation with gene expression: Approach/Tools?

0

Entering edit mode

2.3 years ago

Ankit ▴ 500

Hi everyone,

I have a ChIP-Seq data from multiple histone marks (H3K9me3, H3K27me3, H3K27ac and so on..(n=10)). I want to identify enrichment of these histone marks at genomic features: eg. promoter, gene body, and custom bins of n bp. So that I can correlate this data with differentially expressed genes and confer if the behaviour of histone marks profile influence gene expression. If you can suggest a R package/Tool which can do this entirely it will be the best option.

If not,

Here are the approaches I am following:

Using the bam files of histone marks to predict chromatin state using ChromHMM and overlapping with genomic features to identify those regions which overlap with various chromatin state (the bed file have coordinates of 200 bp span of chromatin states). This approach seems ok but a same promoter coordinate overlaps with multiple chromatin states and it is difficult to determine the exact state of the particular promoter region.

Using the bam files and gene or promoter coordinates, I created a table that contains depth normalized counts. But from this table how to say which histone mark is abundant than the other and directly reflects a particular chromatin state.

I would appreciate any help.

Thank you

histone genomic rna-seq chip-seq chromhmm • 782 views

ADD COMMENT • link 2.3 years ago by Ankit ▴ 500

score 0 · Answer 1 · 2022-08-09

Short answer: I haven't managed to find any libraries for performing this kind of analysis end-to-end. Given that the input is multi-variate (many histone marks), the standard approach is to build a model predicting expression from histone modifications (and potentially additional local features such as TF motifs), and then interrogate the model to make inferences.

Most methods for predicting gene expression from chromatin state (histone modifications +/- accessibility) use a binning strategy (counts normalized for library size, potentially also normalized to background or no-antibody control) as opposed to pre-selection of features via ChromHMM; though the ChromHMM output could prove useful as an additional feature or for model interrogation.

GEx predictions from histone marks have been predicted using linear models (also), support vector machines, random forests, and neural networks of various architectures A, B, C, D. Code is available for the application of SVMs (http://archive.gersteinlab.org/proj/chromodel/index.html) and attention-based NNs (https://github.com/QData/AttentiveChrome or https://github.com/QData/DeepChrome) but no corresponding R packages. IntePareto is a package for very simple Z-score based analysis of RNA and histone state.