Hello everyone, undergrad here. I am looking for some advice concerning my approach for a simple permutation test. I have a list of 1000 mesophyll-specific promoters, and of these 64 contain motif X. I want to see if this is an enrichment (so I could argue motif X is associated with mesophyll-specificity). I have created 1000 lists of 1000 random promoters from the whole genome – and searched for motif X within them. I now have 1000 numbers telling me in each random list how many promoters contain motif X. I want to see if there is a statistical difference between the frequency of motif X in the mesophyll-specific promoters and the frequency in the thousand random promoter lists.
null hypothesis: mesophyll specific promoters are not enriched in motif X compared to a reference distribution computed across 1000 lists of random promoter. I think I can directly compute the p-value= number of times random promoter lists contain more than 64 promoters with motif X divided by 1000.
Example: I find that of the 1000 random lists, 35 contain more than 64 promoters with motif X. 35/1000=0.035. For a list of 1000 mesophyll-specific promoters, the number of promoters containing motif X is statistically significant (pvalue=0.035) if we compare with a reference distribution computed across 1000 lists of random promoters. Thank you in advacne for any help with this - any advice is appreciated! :)