So, for instance, if they predicted an annotation X, they would search for scientific papers that prove that the predicted annotation is correct.
I think I could use the same method for my annotation prediction experiments, but how to do it?
How to search evidence of a certain annotation by browsing literature?
Should I just visit PubMed website and use annotation id as keyword for search?
And which source (EntrezGene, etc) id should I use?
I think you should be searching the literature for the genes that you annotated and compare your automatic annotations for those genes compared to the functions given in literature to the genes. You could also do it the other way around, find a set of genes which are described very well in literature and use your tool to annotate that set, and see how well it performs.
Of course, depending on how your tool works it might work better for genes that have been described in literature very well, so that might skew your accuracy.
Why don't you test your method against a control case set?
For example, you may take a set of annotations that are already know to be correct. I don't know the details of your annotation prediction method, but in the case of the article cited, a control set may be a set of geneontology annotations. Then, you just run your predictor on the data, and see how many correct predictions you get.