I can't figure out how to get MEME to de-emphasize mismatches in the shared motifs that it finds. I have a test example of five 5000-letter DNA sequences (or "sites" in MEME terminology), all of which share a 15-letter motif TTTCCATTTTTAGTA. However, that motif (which is most of the CArG3 binding site of Arabidopsis) contains a lot of repeated letters, so MEME gives it a high e-value (1.2) in comparison to, say, a 24-letter motif with only five exact matching letters, but few repeated letters, so e=1.1e-4.
I'd like to emphasize exact matches in the same way you can use the -perc_identity 100 in BLAST. Many binding site motifs have repeated bases and get heavily down-weighted by MEME's algorithm.
Anyone know of a way to weight toward the number of fully-shared bases in MEME, so that my exactly shared motif will float to the top? I've tried every MEME parameter and can't make it happen.
Alternatively, if not, is there another motif searcher that will allow me to emphasize exact motif matches? I'm using 5000-letter sequences because that's a good size of the upstream gene flanking region to include TF binding sites.
Try feeding in the motif you're interested in to FIMO? I believe it has an option to not allow any mismatches.