Following up on this question: http://biostar.stackexchange.com/questions/598/tools-for-chipseq-scale-motif-finding
I've got a large amount of unaligned eukaryotic regulatory sequences and I want to do de novo motif discovery on them. These unaligned regulatory sequences are already filtered from reads that have no mapping, or reads that wouldn't make a peak.
I've seen most tools require aligned sequences and/or search only for a list of pre-defined motifs.
In it's simplest form, what I am looking for is a program that would read file.fa, where file.fa contains ~1M 50-200bp regulatory sequences, and produce the motif predictions, not needing to align it to a reference or scan for known motifs.
Does anybody know of a tool that would work for this amounts of unaligned fasta sequences and do de novo motif discovery?
How large were your ChIP fragments, and how far did you sequence in? As ChIP-seq sequences from the end of your fragment inwards, do you think the unaligned reads will even have the potential regulatory motifs contained within them?
On prokaryotic or eukaryotic data ?
These unaligned regulatory sequences are already filtered from reads that have no mapping, or reads that wouldn't make a peak. So most of the data with no potential is already filtered out.
It's in eukarya