Hi there,
There are couple of posts circulating around but I couldn't find definitive answer for a non-model organism scenario.
How would one go about finding out which terms, being that PFAM, KEGG or others, are enriched in a group of genes of interest, provided the universe as a background to calculate the enrichment from?
I am familiar with topGO approach that can accept the genes of interest in a simple tab-delimited format of IDs of some kind (might be made up names) and universe as the same ID with GOid simply listed on the same row, separated by comma.
universe:
gene1 GO:0003677, GO:0004803, GO:0006313 ...
gene2 GO:0000160, GO:0003677, GO:0000160 ...
...
genes of interest:
gene1
gene2
I've found myself wondering whether there is a package that would be able to take any kind of terms (PFAM, KEGG, GO, XX) and find whether a subset of IDs of interest is significantly enriched within a broader set. Annotations could happen at later stage.
Any assistance, suggestions, pointers would be appreciated.
Thank you for your suggestions. STRING looks quite impressive.
I'm currently looking at some novel microbes and fungi. I do the gene calling myself, so most of the genes are not initially publicly identifiable. I could perhaps use a sequence as an input to STRING but I would like to do things in a high-throughput manner. Any predefined organism set would rather not suit me. I also don't see why any enrichment method should rely on any organism other that purely for the purpose of predefined set of gene universe.
Thank you!