To start, you should understand what Gene Ontology actually is. GO is a means of providing consistent descriptions of genes/gene products between various databases and projects. GO is a language, not data. Annotation sets of genes from a species with terms from GO are the data you need.
Second, this brings me to PANTHER. PANTHER isn't GO, it is a totally different set of annotations, so it makes sense you're getting different results than the tools relying on GO annotated genes.
Next, you want to make sure that there aren't any differences in the annotation datasets used between platforms. Some websites may be using different versions of the annotation data, some may be using custom or non-standard data.
AgriGO:
Raw GO annotation data is generated using BLAST, Pfam, InterproScan by agriGO or obtained from B2G-FAR center or from Gene Ontology.
BinGO:
Q : The default annotations/ontologies in BiNGO are already several months old. I would like to use more recent annotations.
A : Download the most recent annotation and ontology files from the GO website. You can use these as custom annotation/ontology files.
So you need to understand that there might be differences in the GO associations.
Finally, you need to understand the process that is going on here. The GO annotations are simple curations, GO enrichment is a totally different thing. This is where the specific algorithm and parameters each tool is using really start to matter. Are you getting 17 instead of 213 because some parameter is different. Or, is it just because they're different algorithms?
Never assume that just because two algorithms claim to solve the same problem that you will get the same results. They're doing different things, making different assumptions, require different input data and so on. Also, the semantics really matter, what each algorithm is actually telling you can be different. Never make this assumption unless you have evidence and knowledge to support it.
Also, look back at your data and see which one makes sense. Maybe the program that returned 213 results didn't account for p-value or fold change in your differential expression. Look back at your data and see what makes sense, what do the p-values and fold changes of those 213 genes look like? Are they all very strong or just a few? If it is a dozen out of the 213 that are strongly DE, maybe 17 is better than 213.
A word of advice, never judge the performance of an algorithm based on the plots the programmer makes with the algorithm's output. In other words, what the algorithm is doing is the important thing. You can always make pretty plots with good data, but a pretty plot with bad data is just bad. Find the tool that gives you the best quality results, deal with the plots later.
Pull the pubs on these tools and the methods they use and figure out what they're actually doing. Look at the results you get versus the data you put in, do things make sense? Look at the annotation data used by each tool, is it the same? Can you upload the same GOA data into each tool so that aspect is controlled for? Check and experiment with parameters, how does performance across tools vary with different parameters?
A final bit of advice with these sorts of tools is to always look at version history and release dates. BinGO was last updated 4 years ago, but AgriGO was updated this past August. You should always make sure the software you're using references current databases and that updates for bug fixes have occurred. The GOA version is very important, GO, the species genome, the annotation of genes in that genome and the final GOA product are all moving targets.
These are all good tips for any bioinformatics tools or usage cases.
Ok that is really funny. Quote of the month. Also sadly it is mostly true.
I agree and feel your pain - IMHO this field of bioinformatics is a bit of a mess.
Follow up question. How on earth can you compare the lists of GO result? Some items are probably found in common, but others will be inexact matches like parent/child node in the tree. Is there any quantifiable metric for the similarity of GO lists?
I think this is an open question for Biostars in another thread, but might be worth asking here when people are thinking about the different kinds of results different GO tools make.
A few ways:
This could be partially due to gene identifiers that are "valid" with one tool and not another - see if different tools are throwing out swaths of genes prior to the test due to not finding a match for that id in whatever their definition of a gene is.