Gene Set Size - When Is It Too Small?
1
0
Entering edit mode
10.9 years ago
PoGibas 5.1k

I have small subset of genes that have specific characteristic (e.g., TFBS in their UTRs). Checked enrichment in all set using permutation test (p value = 0). However, only small subset of genes have this TFBS and I don't know is it worth analyzing (e.g., expression, conservation) these genes as set is very small.


Example

Total number of genes in set = 20000
Number of genes with TFBS = 8
Permutation test p value = 0 (aka, all set (20000 genes) is enriched for this TFBS compared to a genomic background)


Questions

How to determine if set size is statistically valid (8 genes out of 20000)? Any test in R?
Is it worth analyzing such a small set of genes and try to show how interesting and important is their biology?

analysis enrichment subjective • 3.5k views
ADD COMMENT
2
Entering edit mode
10.9 years ago

Instead of a permutation test, a Fisher exact test or hypergeometric test is more commonly used to calculate gene set enrichment.

When doing something like GO enrichment (which should use a similar principle), I don't set a hard cutoff for number of genes in the original gene set (in GO), but I typically like to see highly significant values (such a p<1e-5) that should typically include multiple enriched genes within the deferentially expressed gene list (similar to your 2000 gene list, I assume). However, I return the entire list of results p<0.05. Sometimes biologists like to know if a single gene is affected (if that single gene is known to be really important).

BTW, you can try using the TRANFAC enrichment tool in GATHER if you have a list of official gene symbols:

http://gather.genome.duke.edu/

I personally like the upstream regulator function in IPA (based upon literature annotations rather than predicted motifs), but that is commercial software.

ADD COMMENT

Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6