I've been gathering some evidence that several genes are directly regulated by a particular transcription factor I find interesting. I've obtained the transcription factor's binding motif from literature/PAZAR and downloaded sequences 1000 bases to the 5' end of my putative target genes.
Obviously I can (and have) written a regular expression in perl to match occurances of these motifs to my FASTA sequence. But not all of my target genes appear to match my TF's motif in the the sequence I've downloaded. I'm aware of promoter analysis tools (e.g. MEME Suite) but I'm unsure how to use them with the information that I have and I'm not finding the documentation all that helpful (sorry!).
Basically I have:
position frequency matrix e.g.
A [3 3 0 0 0 1 3 0 0 4 0 0 0 4 ] C [1 0 2 0 2 2 1 0 0 0 1 0 4 0 ] G [0 1 2 0 2 0 0 0 4 0 3 0 0 0 ] T [0 0 0 4 0 1 0 4 0 0 0 4 0 0 ]
(or just "TGA[CG]TCA" if that's easier to use/understand)
approx. 20 FASTA sequences of promoter region e.g.
>Gene1 promoter ATGCATGCATGCATGCATGCATGCATGCATGC #(...to a total of 1000 bases)
Many thanks for any help/advice
Great! Thank you - I was trying to use FIMO but I didn't know how to generate a motif file. Your answer is really helpful, thanks a lot (P.S. I cannot upvote or accept this answer for some reason)