Pfam is a database of protein families. Specifically, using HMMER they create hidden markov models that represent a conserved group of proteins (a family). Now, when proteins are conserved we assume there is functional similarity. This is a general assumption and can be impacted in sequence and species specific ways, but in general it works.
So if you can establish that a conserved group of proteins (a family) shares some set of functions, you can assume that any member of that family should also have that function. So if these hold, predicting function becomes a problem of predicting which families a protein may belong to. This is what the authors did, they knew the functions of the families so to infer the potential functions of their proteins they had to find the families they may belong to.
As for the authors data, when they do these forms of annotation in general you should be able to find it either in the supplemental information or in some cases by contacting the author. Always check the supplement in these kinds of papers. For the paper I assume you're referring to the information is in the supplement: http://www.biomedcentral.com/1471-2164/15/183/additional (see additional file 3).
Now, as for predicting function through homology all methods take the same general form but there are important distinctions. In general the idea is to infer function through finding which "thing" with known function matches your "thing" of unknown function. The two more common ways are through BLAST or HMMER/Pfam. The idea is the same, in BLAST you assign functions through specific sequences (BLAST hits) and the other through protein families (as described above).
However, there are important differences. In BLAST you usually infer function through a single best hit. This means your unknown is assigned all of the functions that specific protein has. When using Pfam, you assign all of the functions for all of the significantly high scoring Pfam hits. This seems trivial, but it can be important. Pfam simply looks at the functions that proteins in that family share, using BLAST you get functions that are known for that protein in that specific species.
The key difference is "in that specific species", you may see contextual information specific to the species of the known protein. The kicker is that it can be hard to tell if these "extra" terms are because that protein may do something unique in its host species, or there may be better/more complete annotations for that species. Very few species have concerted efforts to annotate their genomes with GO terms (http://geneontology.org/page/download-annotations)
If your species isn't close phylogenetically to the species with GO annotation efforts, I would use both approaches. BLAST your genes against say UniProt and collect GO terms through the best BLAST hit of each predicted peptide. I would also run HMMER on the predicted peptides and infer functions that way.
Blast2Go is an option, but it is massively slow if you don't buy the full version. It'll take months to annotate a large set of genes/proteins. There are other tools available as previously mentioned, see if those can help.
If you have any programming/database experience, you can easily write a few scripts to handle this. I prefer this approach, it is easier to integrate into other forms of analysis (either on the transcriptome/etc or later analysis).
Or, just use what someone else already did! The data you want is right there in the publication!
Hi Joe,
Regarding your very informative comment, I would like to ask for some points:
Thank you very much in advance!