Cluster regions based on similar/dissimilar ChIP-seq signal profile across the whole region and not just overall mean signal.
1
0
Entering edit mode
9 weeks ago
rls_08 ▴ 40

When evaluating my ChIP-seq signal across a set of regions, the heatmap metaplot suggests that my signal may have at least 3 distinct profiles of enrichment (a single peak, a bimodal peak, and no enrichment) as indicated by the image below. I would like to derive at least 3 clusters of regions based on the signal profiles, but when using kmean=3 in deeptools plotProfile, I feel like the algorithm does not quite seem to capture the differences in signal profiles but rather the overal mean signal across all bins in a region (not sure if this is how deeptools works, but just a guess based on what I have seen on my data). Do you have any recommendations on what tool(s) to use to better cluster these regions?

enter image description here

clustering chip-seq deeptools • 442 views
ADD COMMENT
0
Entering edit mode

Take a look at https://github.com/jokergoo/ComplexHeatmap/issues/57 I am writing a long blog post for all things related to ChIPseq heatmap.

ADD REPLY
0
Entering edit mode
9 weeks ago
LChart 4.7k

You can pull the ScaleRegion data directly into scikit learn:

import pandas as pd
import numpy as np
import sklearn

dat = pd.read_csv('output.tsv.gz', sep='\t', comment='@')
dat = dat[dat.iloc[:,6:].isna().sum(axis=1) == 0]
signal = dat.iloc[:,6:].values
sig_nrm = signal / np.linalg.norm(signal, axis=1)[:, None]
train = np.random.default_rng().choice(sig_nrm, 1000)
sklearn.cluster.MeanShift().fit(train).predict(signal)

On my randomly-selected ChIP-seq data I get 3 clusters of very different sizes. You can play around with the clustering algorithm.

ADD COMMENT

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6