Hi all,
For simplicity I have a single sample that I have run meme-chip on. Also, for simplicity lets assume there is a single motif returned as enriched. The output of the single meme-chip run includes meme results and fimo results. I want to:
1) Identify the genomic locations of all occurances of the enriched motif.
2) Choose an appropriate q-value cut-off for annotation of target genes.
Meme (which looks for enriched motifs) has constructed a PWM of the enriched motif from 5 sequences (q-value <= 0.001). The PWM includes any variable positions within the 5 sequences. This PWM is used as the input to Fimo (which identifies all occurances of the PWM). Fimo returns 10 sequences that match the PWM (q-value <= 0.001).
What has me confused is that 7 of the 10 sequences from Fimo match exactly the 5 sequences from Meme.
So why did meme fail to find those extra 2 that fimo found?
Thank you all
Kenneth
NOTE: - capitalisation is just to highlight variation
- each sequence has its own coordinates (chr/start/end columns not include here)
MEME RESULT
cgtagcta
cgtagcta
Agtagcta
cgtaActa
cgtaActa
PWM (constructed from the 5 sequences)
WgtaPcta # where P can be a or g, and W can be a or c
FIMO RESULT
cgtagcta # matches the PWM + 1 of the 5 meme sequences exactly
cgtagcta # matches the PWM + 1 of the 5 meme sequences exactly
Agtagcta # matches the PWM + 1 of the 5 meme sequences exactly
cgtaActa # matches the PWM + 1 of the 5 meme sequences exactly
cgtaActa # matches the PWM + 1 of the 5 meme sequences exactly
AgtaActa # matches the PWM only
AgtaActa # matches the PWM only
AgtaActa # matches the PWM only
cgtagcta # matches the PWM + 1 of the 5 meme sequences exactly
agtagcta # matches the PWM + 1 of the 5 meme sequences exactly
Probably I'm missing something, but
are the same. They match both the PWM and sequences. There is only the difference of CAPITALIZATION of one of the inner A
edited...that was a typo...thx
What happens if you increase the p-value threshold of meme from 0.001?