count the number of transcription factor binding sites
3
1
Entering edit mode
10.3 years ago
Jessica ▴ 70

Hi all,

Given ChIP-Seq data of a transcription factor, what tools are used to count the number of binding sites of the transcription factor in the whole genome?

Thanks,
Jessica

sequencing • 4.5k views
ADD COMMENT
0
Entering edit mode

what is the form of your data? bam? bed?

ADD REPLY
0
Entering edit mode

It is in the bed format.

ADD REPLY
0
Entering edit mode

so, it is already a peak file. then, each line is a putative binding site. I do not quite understand your question, please state more clearly.

ADD REPLY
6
Entering edit mode
10.3 years ago

I don't think your question has a closed and easy answer. In a perfect world, you run your chipseq data (as aligned reads, bam or bed) through a peak caller, e.g. macs, and each region identified is a binding site, as mentioned by tangming2005.

However, the situation is typically far from perfect for a number of reasons.

  1. The ChIP enrichment is often quite aspecific and noisy, depending on the quality of the antibody. Consider that it's not unusual to have >90% of the reads in the background, i.e. not in peaks.
  2. Some genomic regions tend to be enriched with whatever antibody you use (an artifact that might be due to the way the reference genome is assembled, especially with respect to repetitive regions).
  3. Different peak callers/algorithms might give different numbers of peaks, this difference can even be orders of magnitude. Same goes for using different parameters within the same peak caller
  4. Typically, the more you sequence the more peaks you identify because small bumps that become significant.
  5. Even if the ChIP works perfectly and the peak callers are ideal, there might be opportunistic sites where the transcription factor binds without having much biological relevance (as an aside, possibly related: some chipseq experiments generate many more peaks than genes in the whole genome).

In practice, you could consider as "true" binding sites the peaks which are identified in different replicates and/or which overlap a known sequence motif recognized by your transcription factor (see also the irreproducible discovery rate).

In my opinion, asking "Where are the binding sites?" is not fruitful for the problems above. Better is to ask which binding sites differ between conditions (might be treatments, stages, tissues whatever). This way the quirks associated to ChIP, peak callers etc are averaged out across replicates and conditions.

ADD COMMENT
5
Entering edit mode
9.3 years ago
Kamil ★ 2.3k

You might be interested to read my tutorial on how to use CENTIPEDE to determine if a transcription factor is bound to a genomic site by making use of DNase-Seq data.

ADD COMMENT
0
Entering edit mode

Thanks for the tutorial!

Ming

ADD REPLY
0
Entering edit mode
9.0 years ago
Fidel ★ 2.0k

A practical way to decide if your peak is a true peak and not an unspecific binding is to check if there is a motif associated to your transcription factor at the peak. This can be done using the meme suite. Of course, this solution assumes that your ChIP is for a protein that directly binds the DNA.

ADD COMMENT
0
Entering edit mode

You could use FIMO in the MEME suite to scan for motif models (JASPAR, etc.) across your genome of interest. Take the search result and convert it to a BED file. Then do set operations with BEDOPS tools (like bedmap) to find putative TF binding sites that overlap your ChIP-seq peaks.

ADD REPLY

Login before adding your answer.

Traffic: 1003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6