As in the title, I am looking for approaches for calling large heterochromatin domains (like Lamina-associated domains, average size ~200Mb). I would like to start from bigwig files that show the log2(IP library in CPM/input library CPM)
. The idea is to determine the broad regions of enrichment (with signal > 0) as in the image below. I search different papers but have not found a clear explanation on how the domains are being called.
I have tried MACS and SICER/EPIC but they don't use use the normalized bigwig data for the analysis.
But when the signal depends on the enrichment compared to input, these tools don't seem super reliable. In the case of heterochromatin, I have seen that using BAMs misses a lot of "obvious" domains that are only evident upon log2(IP/input) transformation of each depth-normalized library. Looking at the BAM pile up by itself, for instance, is difficult to determine where the local area of enrichment is. Do you know if any of these tools take into account read depth and the local environment when calling broad domains?
Just filter the bigwig for regions with scores that meet your threshold and then use bedtools merge to merge regions that are connected or within a certain distance to get your contiguous domains. I don't think this type of manual threshold based calling is non-standard, especially for broad marks like k9me3. Although normally I've seen the genome split into bins like 10 kb first and then reads counted over the larger bins for target and IP., but this doesn't seem too far off from starting with your bigwig.