Motif analysis in repeat-rich ChIP-seq data?
3
0
Entering edit mode
5.5 years ago

Hi all,

I’m analyzing ChIP-seq data and currently performing a de novo motif analysis using MEME. The output shows a high number of repeat motifs (e.g. GAGAGAGAGAGA) and only as the 10th motif I find the motif of the TF I chipped. The data is very repeat rich, but repeat masking also results in loss of my TF motif as it is within these repeats. I’m afraid that this will not enable me to identify any other co-occurring motifs of interest, so I was wondering if there is a better approach.

Is there a way to tell MEME not to recognize these simple repeats as motifs? I’ve not found out how to do this just yet.

Do you recommend another tool which is more suitable for repeat-rich sequences?

Thank you very much!

Rob

ChIP-Seq motif meme • 1.5k views
ADD COMMENT
0
Entering edit mode

Can't this be biologically meaningful? Which TF is this?

ADD REPLY
0
Entering edit mode

It sure could be biologically relevant, but I expected the number one motif to be that of the Glucocorticoid Receptor (the chipped TF).

ADD REPLY
0
Entering edit mode
5.5 years ago
ATpoint 85k

You probably have the peak coordinates and then used something like bedtools getfasta. What you can do is to first identify the genomic coordinates of these repeats, e.g. using any of the solutions from A: Code golf: detecting homopolymers of length N in the (human) genome (modified to match these tandem patterns you encountered) and then use these coordinates to filter out any peaks that intersect with these blacklisted coordinates e.g. using bedtools intersect. Then get sequences from the remaining peaks and re-run the motif search.

ADD COMMENT
0
Entering edit mode
5.5 years ago

Seems like dust might be a helpful tool for this. You could also browse the excellent MEME suite Q&A page or post your own question there.

ADD COMMENT
0
Entering edit mode
5.5 years ago

Try GimmeMotifs. It combines different motif prediction tools (including MEME) and compares the identified motifs to a background set of sequences. It usually work very well for ChIP-seq data (disclaimer: I wrote the software).

ADD COMMENT

Login before adding your answer.

Traffic: 2827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6