I have mapped high resolution ChIP-seq data to transcription start sites using windowBed. I now want to bin the data, in bin sizes of my choosing, relative to TSSs so that I can generate heat maps and do k-means clustering on the data.
Update: HOMER can carry out this procedure very quickly. In a single command it can align reads as a BED file to the TSS (or other features) and generate histograms or a matrix suitable for clustering. The window around the TSS can be specified, as can the bin size.
The specific feature of HOMER that accomplishes this is annotatePeaks.pl
Where tss specifies a TSS centric analysis, path to the genome directs to the genome as downloaded and indexed via the configureHomer.pl script from the command line, range around the TSS specifies the range on each side of the TSS into which reads will be mapped (i.e. 1000 is 500bp upstream and downstream of TSS) -p specifies that the peak file is in a BED format and -ghist provides a gene by gene histogram (i.e. a matrix that can be sorted or clusterd by other programs).
In a Annotatepeaks Function From Homer, someone suggested using AnnotateGenomicRegions. I'm not sure if you can use it via command line. At the very least it appears to be a very quick and easy way to obtain the gene annotation for each read, then you can sort and count the number of times each gene appears within the results section in your favorite code. I don't know how one would be able to bin for specific areas of the gene using this program at first glance.
edit: For anyone who doesn't know how to count the number of unique genes, I might use an awk script...
The "annotations_1380035469204_3516.txt" file would be the output of AnnotateGenomicRegions then you could take the first annotation (second column) and count it. I've noticed sometimes there are multiple annotations for a given read so you will have to think about what to do in those cases. Good luck!
+1 You could simply enter the coordinates of the TSSs and then the mapped reads from your ChIP-seq seq data.