Hi, I have a list of Go Ids and the respective over-represented annotations, but most of them are the the child or sub-divisions of a main/parent term. How to hide them or may be statistically merge then under the main/parent category.
Example Set:
GO Term
GO:0006351 transcription, DNA-dependent
GO:0032774 RNA biosynthetic process
GO:0016070 RNA metabolic process
GO:0019222 regulation of metabolic process
GO:0050794 regulation of cellular process
GO:0050789 regulation of biological process
GO:0065007 biological regulation
GO:0048522 positive regulation of cellular process
GO:0031323 regulation of cellular metabolic process
GO:0090304 nucleic acid metabolic process
GO:0080090 regulation of primary metabolic process
GO:0060255 regulation of macromolecule metabolic process
GO:0006139 nucleobase-containing compound metabolic process
GO:0048518 positive regulation of biological process
So, the last terms like positive regulation of cellular process , positive regulation of biological process can go under the broad terms like regulation of biological process and regulation of biological process.
Can suggest some tool which can do it textually or graphically.
Cheers
P.S. Revigo can do it, but something else which can be accessed from terminal or R
I would like to know why you want to do that. In general I think the opposite approach is more useful. In that case you would calculate the significant child terms first, remove (prune) them from the tree and then calculate whether the parent term is still significant. We actually have a paper on that, see: http://dx.doi.org/10.1093/bioinformatics/bts366 . Merging everything in the parent terms often leads to conclusions like: "we did a diet study and found that 'metabolism' was affected". Sigh...
Chris, I will read your paper, looks promising. I acknowledge your point, I am practicing gene ontology and had a notion, that only parent terms are important and we are not mostly interested in childs. Just think of the case as "Regulation of Biological processes" followed by "Positive Regulation of Biological processes" and "Negative Regulation of Biological processes", in that case, one would like to see just the parent, isn't it.
Thanks
Cheers
The problem with doing that is how far up the ancestor tree do you go? ReviGo has a nice implementation to get semantic relevance out of the GO structure. I was planning to try to replicate their algorithm in python but I just can't find the time. Another alternative is to maybe use GO slim annotations instead. But I often find that too be too vague.
Hey, I assume one should go up to the main parent term in the tree and then jumps to the next tree. The best way now I think is to cherry-pick the terms one want to see from the basket of highly significant terms and represent them either visually or textually.