P-value calculation for DNA motifs
1
0
Entering edit mode
3.4 years ago
siu ▴ 160

Dear All, I want to calculate the statistical significance for finding a DNA motif in the promoter sequence of specific length. for example: The motif "ATCGAT" is occuring 5 times in the 2000bp promoter. So what kind of statistical test can be done for finding the enrichment of this motif in the promoter sequence?

Is there any python script for this?

Please help

I will be very grateful for this.

Thanks in advance

python Genomics DNA R • 646 views
ADD COMMENT
0
Entering edit mode
3.4 years ago
Mensur Dlakic ★ 28k

A simple calculation for a number of non-unique DNA k-mers is 4 raised to the power of k (4^k). That means there are 256 non-unique tetramers, 1024 for pentamers and 4096 for hexamers. Statistically speaking, any given hexamer would be expected to occur once in 4096 nucleotides, so 5 in 2000 is statistically significant.

It is a different question whether it is biologically significant. Your motif is short, which is usually the case with eukaryotic TFs. Yet your motif is a palindrome, which is usually not the case with eukaryotic TFs. All that and a neat 2000 bp promoter size sounds like a made-up example rather than being real, so I think this might be a homework. I will let you figure out the rest.

ADD COMMENT

Login before adding your answer.

Traffic: 2827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6