Question

Rank Chip-Seq Peaks Based On Motif Occurrence?

0

Entering edit mode

11.6 years ago

daniel.soronellas ▴ 330

Hi everybody,

I post today after long time without finding the proper answer to my question.

More or less the workflow for analyzing TF ChIPseq data is becoming clear:

ChIP --> Library preparation --> Sequencing --> Pre-filtering and QC check --> Mapping to reference genome --> Find enrichment signals (peaks) --> Motif Discovery to cross validate if the top ranked peaks present the TF motif

But I'm interested into rank all my peaks into some sort of indicator whether they have a presence or not of the desired TF and subset the peak list to the TF real ones. I have been trying with MEME/MEME-chip/TOMTOM/MAST and also with HOMER, but I'm getting confused because with these suites I'm not particularly sure I could obtain what I want (or maybe I missunderstood something)

So, my question is: Is there any way to rank the peaks according to the presence or not of the TF, just to add it to the peak calling statistics?

Thanks for your help and suggestions!

chip-seq motif peak-calling • 4.9k views

ADD COMMENT • link updated 11.6 years ago by spacemorrissey ▴ 280 • written 11.6 years ago by daniel.soronellas ▴ 330

score 3 · Answer 1 · 2013-04-12

3

Entering edit mode

11.6 years ago

Anthony Mathelier ▴ 910

You can run MEME or MEME-chip on the top 600 sequences (ChIP-seq peak scores) and then run MAST with the obtained motif on the whole data set of peaks. So you will get the list of potential sites for your motif in the sequences.

Otherwise, you could use RSAT (http://rsat.ulb.ac.be/rsat/) and its peak-motifs analysis that will look for over-represented motifs directly using the whole data set of peaks and give you the positions of the instance of the motifs in the sequences with other information.

ADD COMMENT • link 11.6 years ago by Anthony Mathelier ▴ 910

0

Entering edit mode

I like your solution using MAST. Thanks for your suggestion, I will try it.

ADD REPLY • link 11.6 years ago by daniel.soronellas ▴ 330

score 1 · Answer 2 · 2013-04-13

This can be a complicated question based on the particular TF you are looking at. Many TFs, even those with strong motifs have shown binding to segments that do not contain the motif. That being said, once you have a motif that you are interested in, there are a few programs that will find all occurances of that motif genome wide. Cladimo does this and I believe that fimo does this as well. Then you can simply intersect the motif occurances with you peaks. Another thing to keep in mind is that depending on the size of your motif, you may have multiple motif occurances within a peak.

Having a motif in your peak is reassuring, but if you are trying to figure out whether your peaks are true signal or noise, you probably want to look at as many sources of information as you can. Overlap with DHS or peaks for other co-factors work well for this if they are available.