Beginer question here. Im using GOrila to perform gene ontology on 5 sets of genes (of stages of differentiation) I notice some enriched biological process like cellular process or metabolic process as parent terms, and other not that enriched process as child terms.
I want to know is: should I pick the most enriched (generical) term or the (most especific) non that enriched child term. Or both and make a super long list.
(My samples are early differentiating cells, and I expect many biological process non related to differentiation, but I'm work only on process relevant to differentiation.)
Not quite sure I understand what you mean but if the parent term, say "cellular process", is found enriched with some statistical significance and the child term "base-excision repair" looks enriched but without statistical significance, you shouldn't report the child term as enriched (or if you do, give the p-value so that people can evaluate how likely it is to be a chance finding), i.e. you can't transfer enrichment from parent to child.
Also if you're bothered by the redundancy aspect of ontology terms, there are methods that deal with this in a principled way. I generally use the method from this paper.
As f815081 said, parent terms are to be avoided when they are too broad (ex. cellular process). But in some cases, "intermediate parents" can summarize your results pretty well.
For instance, in the example below, we have 5 enriched GO terms. In this case, we should avoid reporting cellular process as it is not informative. Then if you want to be very specific, you can always report the three children at the bottom. However if you want a broader but informative view, reporting only DNA repair is enough.
cellular process
|
|____________ DNA repair
|
|____________ base-excision repair
|____________ double-strand break repair
|____________ Shu complex
Thanks so much, exactly what I was looking for. When but since those therms are enriched Im going to include them in a table of most enriched but not going to enphasize them, but the especific ones. am I right?
Yes, exactly. In a table you have to put them all, but when you have to communicate with your colleagues or your boss or discuss results in a scientific paper, then focus on what is relevant.
I know this is an old question, but if I am only interested in the "intermediate parent" terms, is there a tool developed that will allow me to easily extract this information out for a list of GO id's?
How do you define "intermediate parent"? You can't rely on levels because a given child term can appear in different branches at different levels. The simplest way to work with a choice of terms is to make a flat list of the terms you're interested in plus a catch all one such as "other processes/functions/compartments".
Go for specific terms as they specifically explain the function of you desired gene. General terms are like zoomed out version of a biological process, for instance. They explain where your desired gene would fall (in ontological term) broadly.
Thanks so much, exactly what I was looking for. When but since those therms are enriched Im going to include them in a table of most enriched but not going to enphasize them, but the especific ones. am I right?
Not quite sure I understand what you mean but if the parent term, say "cellular process", is found enriched with some statistical significance and the child term "base-excision repair" looks enriched but without statistical significance, you shouldn't report the child term as enriched (or if you do, give the p-value so that people can evaluate how likely it is to be a chance finding), i.e. you can't transfer enrichment from parent to child. Also if you're bothered by the redundancy aspect of ontology terms, there are methods that deal with this in a principled way. I generally use the method from this paper.