How to perform enrichment p-value for a motif
0
0
Entering edit mode
8 months ago
Apex92 ▴ 300

I have a data frame that looks like below.

Category,Total_Genes,UUAGGG_motif
Background,22591,18190
SetA,122,102
SetB,198,182
SetC,90,82

I have counted the number of motifs available in each category. Now I want to calculate three p-value (SetA vs Background - SetB vs background and SetC vs background) to see in which of the three categories the motif is enriched considering the size of each category.

I came up with this approach in R - is this the correct way? Thank you in advance.

library(hypeR)

# Number of background genes
N <- 22591
# Number of background genes with motif
K <- 18190

# Set A
n_A <- 122
k_A <- 102

# Set B
n_B <- 198
k_B <- 182

# Set C
n_C <- 90
k_C <- 82

# Perform hypergeometric test for Set A
p_value_A <- 1 - phyper(k_A - 1, K, N - K, n_A, lower.tail = TRUE)

# Perform hypergeometric test for Set B
p_value_B <- 1 - phyper(k_B - 1, K, N - K, n_B, lower.tail = TRUE)

# Perform hypergeometric test for Set C
p_value_C <- 1 - phyper(k_C - 1, K, N - K, n_C, lower.tail = TRUE)
Enrichment statistics • 296 views
ADD COMMENT
0
Entering edit mode

This seems like a good time to use MEME suite of tools. You can use something like IUPAC2MEME to get your PWM then a tool like centrimo with your gene list and your background list.

ADD REPLY

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6