I am trying to discover potential motifs in 73k sequences of length 50 basepairs (expecting at least 10 motifs). But however, I am not certain of the expected Motifs to enrich my sequences.
I am afraid this would impact the motif discovery in MEME. Any suggestions on how to address this?
MEME has an option to do a discriminative motif finding, you can generate a set of equally sized background sequences and find motifs that are present in the test set and not in the background, this should improve the specificity of the MEME run.
Another issue is the search mode you use, MEME has 3 modes: zero or one motif per sequence (zoops), one per sequence (oops) and any number of repeats (anr), if you think the motif is not represented enough, choose zoops mode.
Another issue you should take into account is the run size - the web MEME interface is limited to 60000 bp, you should probably install MEME locally to run this job.
I made a local installation of MEME and was able to run a sample test. Now I have some concerns on how to run the test.
I have about 7500 sequences of about 50 bp length for motif discovery. I am interested in motifs that are centered around 15th and 40th position on the sequence. I am expecting somewhere around 6 to 7 motifs.
Can you help me understand, how can I generate the background sequences and perform a comparative MEME run, so I can figure out motifs in test set with better accuracy?
is there any significance for parameters like -bfile and -psp? how can I use them?
The psp and bfile parameters allows you to direct MEME to the right motif (you should read about MEME to understand how it works and how these parameters influence).
The background sequences should be as close as possible to the test sequences, if you used a script to generate the test sequences try to use the same script but this time choose random starting points or random genes.
You could try Weeder (currently appear to be offline), the stand-alone version can take a make larger number of sequences and has a reasonable model for genome background.
EDIT:
The Weeder website was updated without the author's knowing. Now operational/accessible.
why do you think that using the MEME suite is not appropriate?
Not that I m saying it is not appropriate, I am wondering if this large data set with 10 different motifs (expecting) would be okay...
or in other words is it okay to do such a MEME run? and how can I improve the results!