This is a followup to Resurrecting DNA motif finding project.
I'm looking for sets of aligned DNA sequence motifs to use for testing my search algorithm. This algorithm looks for correlations across the whole motif, so it performs best if
a) The length of the motif is small. Say between 10 and 30 characters long, preferably. Anything shorter or longer would probably not work well.
b) The set is large. Ideally several hundred. The longer the motif, the larger the set needs to be.
If you know of motifs like these, please list them. It would be helpful if a link could be provided to the data, preferably as a FASTA file, and also a description of the biological significance of the motifs. A description of the conserved regions would also be helpful.
I've not a biologist, so please don't assume a lot of biological background. Thanks.
Thanks Sean. This looks interesting. Now all I have to do is figure out what I need to download... I can't tell if FASTA files are available - I don't see them.
The files in http://jaspar.genereg.net/html/DOWNLOAD/sites/ look like FASTA files, though they are labelled *.sites. Have I got this correct? Is this what I need?
Yes, those are FASTA format files.