Hello,
From my H3K27ac ChIP seq data, I have identified 500 super enhancer regions using Homer's findPeaks -style super. From the super enhancer regions, I found 4 enriched binding motifs within the 500 super enhancer regions using Homer's findMotifsGenome.pl.
First, I would like to find all significant binding motifs within a single super enhancer region (can be as large as 85kb). So far, I have used MEME and Tomtom from the MEME suite, RSAT, and TRAP. Which motif discovery algorithm should I use to identify many/all motifs within a long region? Ideally the algorithm should have motif database comparison to identify which TFs belong to the motif. It would also be a big plus if the discovered motifs could be visualized on a genome viewer like IGV.
- MEME: I have set MEME to find the top 20 motifs within the region. However, most of motifs found are repeat DNA sequences. Is there a way to filter out the repeat sequences during motif discovery in MEME?
- RSAT: I ran its motif discovery algorithm using 3 different TF binding motif databases: Jaspar core nonredundant vertebrates database, footprintDB, and homer's. The 3 runs discovered the same motifs (with the same sequences), but output completely different TFs matches for each motif. What database would be best to reference for human cells?
- TRAP only takes sequences up to 5kb and is not ideal for my very large regions.
Second, I would like to identify all the super enhancers that contain the top enriched motif. So far, I plan on running all 500 super enhancer regions through FIMO in MEME suite. However, this approach doesn't seem most optimal. I would really appreciate any recommendations.
Thank you!