Hi all,
Given ChIP-Seq data of a transcription factor, what tools are used to count the number of binding sites of the transcription factor in the whole genome?
Thanks,
Jessica
Hi all,
Given ChIP-Seq data of a transcription factor, what tools are used to count the number of binding sites of the transcription factor in the whole genome?
Thanks,
Jessica
I don't think your question has a closed and easy answer. In a perfect world, you run your chipseq data (as aligned reads, bam or bed) through a peak caller, e.g. macs, and each region identified is a binding site, as mentioned by tangming2005.
However, the situation is typically far from perfect for a number of reasons.
In practice, you could consider as "true" binding sites the peaks which are identified in different replicates and/or which overlap a known sequence motif recognized by your transcription factor (see also the irreproducible discovery rate).
In my opinion, asking "Where are the binding sites?" is not fruitful for the problems above. Better is to ask which binding sites differ between conditions (might be treatments, stages, tissues whatever). This way the quirks associated to ChIP, peak callers etc are averaged out across replicates and conditions.
You might be interested to read my tutorial on how to use CENTIPEDE to determine if a transcription factor is bound to a genomic site by making use of DNase-Seq data.
A practical way to decide if your peak is a true peak and not an unspecific binding is to check if there is a motif associated to your transcription factor at the peak. This can be done using the meme suite. Of course, this solution assumes that your ChIP is for a protein that directly binds the DNA.
You could use FIMO in the MEME suite to scan for motif models (JASPAR, etc.) across your genome of interest. Take the search result and convert it to a BED file. Then do set operations with BEDOPS tools (like bedmap) to find putative TF binding sites that overlap your ChIP-seq peaks.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
what is the form of your data? bam? bed?
It is in the bed format.
so, it is already a peak file. then, each line is a putative binding site. I do not quite understand your question, please state more clearly.