Repeat sequences, such as 45S rDNA (gene) and many tandem repeats distributed throughout the genome. These repeats can identical in its sequence or they may vary at few nucleotide positions.
I know they can be scan through FIMO (which is program from MEME-Suite) for their location in whole genome. But FIMO required MEME-motif file which mainly contain "letter-probability matrix". MEME motif file example is shown below:
MEME version 4
ALPHABET= ACGT
strands: +
Background letter frequencies
A 0.25 C 0.25 G 0.25 T 0.25
MOTIF MA0002.1 RUNX1
letter-probability matrix: alength= 4 w= 18 nsites= 18 E= 1.1e-006
0.611111 0.000000 0.055556 0.333333
0.555556 0.000000 0.111111 0.333333
0.222222 0.166667 0.222222 0.388889
0.000000 0.111111 0.000000 0.888889
0.000000 0.055556 0.944444 0.000000
0.111111 0.000000 0.000000 0.888889
0.055556 0.000000 0.888889 0.055556
0.833333 0.111111 0.055556 0.000000
0.111111 0.388889 0.277778 0.222222
0.333333 0.055556 0.500000 0.111111
0.111111 0.222222 0.111111 0.555556
0.277778 0.222222 0.222222 0.277778
0.111111 0.055556 0.722222 0.111111
0.388889 0.166667 0.055556 0.388889
0.055556 0.000000 0.111111 0.833333
0.055556 0.777778 0.000000 0.166667
0.777778 0.000000 0.222222 0.000000
0.277778 0.611111 0.055556 0.055556
I have read the MEME motif format (http://meme-suite.org/doc/meme-format.html) in detail. I have problem in understanding the concept behind generating the "letter-probability matrix" in MEME-motif format. They also have not mentioned any software/script which directly converts fasta format to MEME-motif.
So, I have two questions regarding this:
- How to convert single fasta file into this MEME-motif file?
- In case I have multiple copy of fasta file of such tandem repeats which are varying at few location only, how to convert these fasta files into one MEME-motif file?