Question

De-novo motif discovery and repeats

0

Entering edit mode

6.9 years ago

rbronste ▴ 420

Just a general question, often when running something like meme to search for de novo motif hits in my data the top few sequences I get back look like the following:

Motif 1 regular expression
--------------------------------------------------------------------------------
T[GC]T[GC]T[GC]T[GC]T[GC]T[GC]TGT[GC]T[GC]T[GC]T[GC]T

Wondering if there is an appropriate way to scan for enriched de novo motifs and avoid the repetitive stuff. I realize some TFs have these kinds of consensus sequences but I am assuming most do not. Thoughts?

Thanks.

motif meme fimo de novo • 1.7k views

ADD COMMENT • link updated 6.9 years ago by Alex Reynolds 36k • written 6.9 years ago by rbronste ▴ 420

score 0 · Answer 1 · 2017-12-29

You might look at regions that match these patterns and investigate associations with footprints or ChIP-seq signal etc. that associate with gene regulation. You might filter out repeat-masked regions. Basically, the idea is to integrate annotations to eliminate motifs that are not generally functional.