I'm trying to represent my ChIP-Seq counts, normalized or not, in specific genomic bins but don't know how to do so.
I have already processed my data, and have used findPeaks
followed by pos2bed.pl
to produce .bedGraph
files that contain this info. However, I'd like to have counts summarized for each 10kb bin throughout the genome (this is OK for my purposes). My .bedGraphs
contain some of this information, but not spread in equally defined 10kb bins.
I was looking at Homer's annotatePeaks.pl -hist <bin size>
, which seems to have data summarize in specific bins, but these are around a peak which is not really what I want. However, I am particularly interested in having them represented in specific genomic bins throughout the genome (i.e. not only those that would be found in a distance d
around a peak). I'm sure there is a tool to summarize this, but I'm just not aware of which one to use.
Could someone advice on how I could bin my data?
@Prakash, thanks a lot for your suggestion. Just to make sure, you mean something like this:
My question is then: how are the summaries done? For instance, is each bin showing the mean of counts in that region? I couldn't find this information.
Thanks a lot
Edit: found this thread with some useful information
This will give mean coverage across your binned genomic regions. you can also use genomeCoverage bed to get reads normalized per million.