Functional Enrichment With Large Numbers Of Genes
2
1
Entering edit mode
12.6 years ago
Rubal7 ▴ 850

Hello all,

I have a large list of genes that come from genomic regions identified using a population genetic test statistic on genome-wide sequence data (not expression data). I would like to see if this list of genes contains enrichment for any particular biological categories. However the list of genes is vary large - hundreds of genes, and it seems that DAVID and Panther are unable to handle such a large list. Does anyone know of gene list enrichment software that is not constrained to a limited number of genes (as far as people are aware).

Thanks in advance,

Rubal.

enrichment function genome pathway • 3.0k views
ADD COMMENT
1
Entering edit mode
12.6 years ago
SES 8.6k

I have used Ontologizer on large data sets without any problems. Also, you might want to try searching through the archives of this site (if you haven't already). I know that is not specific, but GO-related questions come up frequently and you might be able to dig up something useful. Good luck.

ADD COMMENT
0
Entering edit mode
12.6 years ago
tiagoantao ▴ 690

I am going to take a leap of faith here and imagine that your statistic is something like Fst, iHS or xpEHH. If that is the case then you will have a set of genes around your statistical areas of interest (most statistical tests do not have spacial precision to pinpoint a gene, but only a window). This means that you will have gene clustering around your statistic. Therefore you might have several genes with similar GO terms in the search area therefore inflating that GO term.

All this to say that you might have to do your analysis window based (typically 200kb windows with humans) and not gene based. Most GO tools are not made with pop gen statistics in mind and are gene based (as you know by now).

This might not be your case, but if you are using standard pop gen stuff, you might have to add an extra layer of analysis.

I am aware that this is the opposite of an answer: I am raising yet another problem. But if you are doing standard pop gen stuff you will have to deal with lack of statistical precision in spacial terms and use window based approaches instead of gene based approaches.

I suggest reading Grossman & Sabeti paper on science for an idea of the problem of spacial precision with pop gen (selection in the case) stats. Please note that I am not suggesting to use their solution (just a useful read to the problem of precision).

There are papers doing GO analysis with pop gen status and window approaches. I do not have any here, but you can search for them... Window based approaches (not GO) are well represented in Pickrell et al "Signals of recent positive selection in a worldwide sample of human populations"

ADD COMMENT
0
Entering edit mode

Thanks for raising this concern, you are right and the issue of windows does complicate the search for validity in GO analyses. I'll go back to these papers as food for thought. The windows I have are also particularly large, some several megabases, as I looking for the longest regions of homozygosity, which means I have very large gene lists that will be diluting the true signal.

ADD REPLY
0
Entering edit mode

Later, if you need I have some code (Python) to get all GO terms for a genomic region and calculate enrichment. I have not published it, but I would have no problem in giving it to you

ADD REPLY
0
Entering edit mode

Thanks that could be very useful. I'll get in touch soon perhaps.

ADD REPLY

Login before adding your answer.

Traffic: 2832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6