Question

Functional Enrichment With Large Numbers Of Genes

1

Entering edit mode

13.2 years ago

Rubal7 ▴ 850

Hello all,

I have a large list of genes that come from genomic regions identified using a population genetic test statistic on genome-wide sequence data (not expression data). I would like to see if this list of genes contains enrichment for any particular biological categories. However the list of genes is vary large - hundreds of genes, and it seems that DAVID and Panther are unable to handle such a large list. Does anyone know of gene list enrichment software that is not constrained to a limited number of genes (as far as people are aware).

Thanks in advance,

Rubal.

enrichment function genome pathway • 3.3k views

ADD COMMENT • link updated 13.2 years ago by tiagoantao ▴ 690 • written 13.2 years ago by Rubal7 ▴ 850

score 1 · Answer 1 · 2012-05-07

I have used Ontologizer on large data sets without any problems. Also, you might want to try searching through the archives of this site (if you haven't already). I know that is not specific, but GO-related questions come up frequently and you might be able to dig up something useful. Good luck.

score 0 · Answer 2 · 2012-05-07

I am going to take a leap of faith here and imagine that your statistic is something like Fst, iHS or xpEHH. If that is the case then you will have a set of genes around your statistical areas of interest (most statistical tests do not have spacial precision to pinpoint a gene, but only a window). This means that you will have gene clustering around your statistic. Therefore you might have several genes with similar GO terms in the search area therefore inflating that GO term.

All this to say that you might have to do your analysis window based (typically 200kb windows with humans) and not gene based. Most GO tools are not made with pop gen statistics in mind and are gene based (as you know by now).

This might not be your case, but if you are using standard pop gen stuff, you might have to add an extra layer of analysis.

I am aware that this is the opposite of an answer: I am raising yet another problem. But if you are doing standard pop gen stuff you will have to deal with lack of statistical precision in spacial terms and use window based approaches instead of gene based approaches.

I suggest reading Grossman & Sabeti paper on science for an idea of the problem of spacial precision with pop gen (selection in the case) stats. Please note that I am not suggesting to use their solution (just a useful read to the problem of precision).

There are papers doing GO analysis with pop gen status and window approaches. I do not have any here, but you can search for them... Window based approaches (not GO) are well represented in Pickrell et al "Signals of recent positive selection in a worldwide sample of human populations"