Firstly, I want to differentiate between MEME & FIMO. MEME is used to find which motifs (i.e. sequence patterns) appear frequently in your sequence file, whereas FIMO uses a known set of motifs (sequence patterns specific for particular TFs) and tries to find if those motifs appear in your sequence file.
As far as I know, based on experience, FIMO takes a motif file that is NOT in FASTA format. However, FIMO is used to find which motifs appear in your query which IS a FASTA file. You have to download matrix.dat (from BIOBASE's TRANSFAC database) or you can download it from here. You then have to run transfac2meme (which is part of the MEME suite) as follows:
transfac2meme matrix.dat > matrix.meme
The matrix.meme file contains nucleotide probability frequency matrix for each motif like this:
Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000
MOTIF V_MYOD_01 MyoD
letter-probability matrix: alength= 4 w= 12 nsites= 5 E= 0
0.200000 0.400000 0.400000 0.000000
0.400000 0.200000 0.400000 0.000000
0.600000 0.000000 0.200000 0.200000
0.000000 1.000000 0.000000 0.000000
1.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.800000 0.200000
0.000000 0.200000 0.800000 0.000000
0.000000 0.000000 0.000000 1.000000
0.000000 0.000000 1.000000 0.000000
0.000000 0.200000 0.400000 0.400000
0.000000 0.400000 0.000000 0.600000
0.200000 0.000000 0.600000 0.200000
MOTIF V_E47_01 E47
letter-probability matrix: alength= 4 w= 15 nsites= 11 E= 0
0.363636 0.363636 0.272727 0.000000
0.181818 0.454545 0.363636 0.000000
0.272727 0.181818 0.363636 0.181818
(and so on for other motifs)
Then use the matrix.meme file to check which motifs appear in your sequence file:
fimo [options] matrix.meme query.fasta
Output of fimo when I ran matrix.meme on a fasta file containing MYOD peaks:
#pattern name sequence name start stop strand score p-value q-value matched sequence
V_MYOD_01 chr1 6204681 6204692 - 11.2641 7.11e-05 ACTCAGGTGTCT
V_MYOD_01 chr1 6205087 6205098 - 14.0614 1.35e-05 CGTCAGGTGCTG
V_MYOD_01 chr1 6277494 6277505 + 10.6425 8.78e-05 TGACAGGTGTTG
V_MYOD_01 chr1 6810137 6810148 + 12.1965 4.79e-05 CAGCAGCTGCTG
V_MYOD_01 chr1 6810137 6810148 - 12.1965 4.79e-05 CAGCAGCTGCTG
V_MYOD_01 chr1 7196368 7196379 + 17.0917 1.15e-07 CAACAGGTGTTG
V_MYOD_01 chr1 7535485 7535496 - 12.1965 4.79e-05 GAGCAGCTGCTG
V_MYOD_01 chr1 8009701 8009712 - 12.1188 4.99e-05 AAACAGCTGTCA
Thanks for the long and detailed reply, but I think you misinterpreted my question;
Usage:
Input:
<motifs>
is the name of a file containing a list of motifs, in MEME format.<database>
is the name of a file containing a collection of sequences in FASTA format. The character-
can be used to indicate that the sequence data should be read from standard input. This can only be used if the motif file contains a single motif.I have the
<motifs>
input in meme format already - note that this is supposed to be in html, txt, or xml format: http://meme.nbcr.net/meme/doc/meme-format.htmlI do NOT have the
<database>
file in FASTA format. The database that I downloaded from the link on the MEME suite site: http://ebi.edu.au/ftp/software/MEME/Databases/motifs/motif_databases.12.1.tgzgives me databases in .meme format. The link you gave me and the directions you gave me also yield a database in .meme format, when the required is in fasta format. The command line should look like:
where
meme.html
can also bememe.txt
ormeme.xml
FIMO is used when you already HAVE a sequence file (in FASTA format) and you want to search for known motifs in it. So FIMO cannot be used here in your case. The FASTA file (which is your query and which you already should have because it is after all a query set) is the <database>
And as far as the confusion goes, <motifs> takes meme output as well as matrix.meme files.
Thanks for the quick reply, although I am rather confused now -- to start from the beginning, I had several sequences which were all together in FASTA format that I ran through the MEME program to find a common motif in those sequences. Now, I want to find if any known transcription factors will bind to that motif I just found. I thought FIMO fit the job, but apparently it doesn't - what other technique should I use?
So you have the sequence file in FASTA right? Lets name is query.fasta. That is your
<database>
. Now what you can do is, create a matrix.meme file like I suggest in my answer, and run:After this you will get an output with what known motifs bind to your sequences. Fimo will give you output like this. You will not only get which motif binds to your sequences but also get the sequence pattern of the motif. You can then cross-validate these results with MEME output by checking if any of the patterns obtained in MEME are found in FIMO output too.