Tool For Binning Windowbed Output For K-Means Clustering
4
1
Entering edit mode
11.3 years ago
bede.portz ▴ 540

I have mapped high resolution ChIP-seq data to transcription start sites using windowBed. I now want to bin the data, in bin sizes of my choosing, relative to TSSs so that I can generate heat maps and do k-means clustering on the data.

What tool/s exist for doing this?

Thanks!

bedtools chip-seq clustering heatmap • 3.3k views
ADD COMMENT
3
Entering edit mode
11.3 years ago
vj ▴ 520

You can take a look at seqMiner a standalone software.

ADD COMMENT
0
Entering edit mode

+1 You could simply enter the coordinates of the TSSs and then the mapped reads from your ChIP-seq seq data.

ADD REPLY
1
Entering edit mode
11.2 years ago
bede.portz ▴ 540

Update: HOMER can carry out this procedure very quickly. In a single command it can align reads as a BED file to the TSS (or other features) and generate histograms or a matrix suitable for clustering. The window around the TSS can be specified, as can the bin size.

The specific feature of HOMER that accomplishes this is annotatePeaks.pl

usage:

annotatePeaks.pl tss ~/pathToGenome -size <range around TSS> -hist <bin size> -ghist -p Read/Peak file > output.txt

Where tss specifies a TSS centric analysis, path to the genome directs to the genome as downloaded and indexed via the configureHomer.pl script from the command line, range around the TSS specifies the range on each side of the TSS into which reads will be mapped (i.e. 1000 is 500bp upstream and downstream of TSS) -p specifies that the peak file is in a BED format and -ghist provides a gene by gene histogram (i.e. a matrix that can be sorted or clusterd by other programs).

I hope this helps.

Bede

ADD COMMENT
0
Entering edit mode

thanks for following up with a solution, that is always great to see

ADD REPLY
0
Entering edit mode
11.3 years ago
Jason ▴ 940

In a Annotatepeaks Function From Homer, someone suggested using AnnotateGenomicRegions. I'm not sure if you can use it via command line. At the very least it appears to be a very quick and easy way to obtain the gene annotation for each read, then you can sort and count the number of times each gene appears within the results section in your favorite code. I don't know how one would be able to bin for specific areas of the gene using this program at first glance.

edit: For anyone who doesn't know how to count the number of unique genes, I might use an awk script...

 cat annotations_1380035469204_3516.txt | awk ' { print $2}' | sort  | uniq -c | sort

The "annotations_1380035469204_3516.txt" file would be the output of AnnotateGenomicRegions then you could take the first annotation (second column) and count it. I've noticed sometimes there are multiple annotations for a given read so you will have to think about what to do in those cases. Good luck!

ADD COMMENT
0
Entering edit mode
11.3 years ago
bede.portz ▴ 540

I have installed SeqMINER, but the documentation doesn't explain how to alter bin sizes. I want to map the data to TSS in small bins (5-10bp).

Any advice on usage?

Thanks.

ADD COMMENT
0
Entering edit mode

There is a option in the Tools --> Options --> Clustering options --> Wiggle step. See it changing that helps.

ADD REPLY

Login before adding your answer.

Traffic: 2061 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6