Which Statistics Should Be Applied To Determine Whether A Certain Motif Is Significant Among Amount Of Dna Sequences?
1
0
Entering edit mode
10.7 years ago
dustar1986 ▴ 380

Hi,

I have got 5000 DNA sequences. And I used MEME to find enriched motif among them.

It returned 134 sequences sharing a certain motif.

Which statistical test should I apply to determine whether such 134 out of 5000 is significant or not?

Thanks in advance.

motif meme statistics • 4.3k views
ADD COMMENT
0
Entering edit mode

You can compute P-value for them.

ADD REPLY
0
Entering edit mode

Got it. Thanks a lot.

ADD REPLY
4
Entering edit mode
10.7 years ago

MEME should return the evalue for each motif it finds. I think the evalue is the statistics your are looking for. From this page:

The E-value is an estimate of the expected number of motifs with the given log likelihood ratio (or higher), and with the same width and number of occurrences, that one would find in a similarly sized set of random sequences. (In random sequences each position is independent with letters chosen according to the background letter frequencies.) Motifs with E-values larger than 0.01 (1e-2) are possibly just statistical artifacts, and not real motifs

So if the evalue for your motif is <<0.01, then 134/5000 sequences having your motif is a significant enrichment.

ADD COMMENT
0
Entering edit mode

This is the right answer. If you have set the background genome correctly, because the baseline for randomness looks quite different between mammals and fungi or virii...

ADD REPLY
0
Entering edit mode

Thanks a lot. This is exact what I want. I should have read the manual carefully.

ADD REPLY

Login before adding your answer.

Traffic: 1893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6