I'm working with the GO slims sets predefined by the GO consortium and I'd like to know how these terms were selected for the sets, do you know any site/paper where the criteria for selection are described?
In particular I'd like to know why in the GOSlim generic set there are no terms of the Molecular function ontology
The generic GO slim does not contain function terms, because in general functions can be considered "small processes".
When links are established between the function and process ontology
"protein kinase activity" is a child of "protein phophorylation"
"transcription factor" function terms are children of the process transcription
"transmembrane transporter activity" are children of "transmembrane transport"
See for example
http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0022857#term=ancchart
GO slims are indeed specific for the project/analysis, and this slim is created to see a general high level view of the "biological" role of a gene set.
We will shortly be creating a slim to address function specifically. However,
it might not be very informative for most purposes as the distribution of annotation is very unevenly distributed.
Most of the GO slims should have contact information in the file header so you can get directly in touch with the authors. As maintainer of the GO website, I can tell you that the GO slims are generally created by a certain group by selecting terms that are most useful to them. The generic GO slim was overhauled recently by members of the GO editorial team and Val (who has also responded to this question). This slim is intended to provide a small set of terms useful for the annotation or analysis of any organism.
I've also asked GO team and I got this answer from people working on the generic GO Slim data set (I also asked for any kind of documentation of how the sets were created)
[?]Hi Marina - there's currently no documentation for how we selected the GO slim terms...we are going to write a paper in the fairly near future so watch this space. Basically though, they were selected manually on two criteria:
Coverage. Do the GO slim terms cover most of the genes and proteins currently annotated to GO for all species?
Biological relevance. Is the GO slim term meaningful for use in biological analysis?
The idea was that GO slim generic would be a starting slim set which users could tailor for their own needs.
BMC Bioinformatics. 2010 Oct 7;11:498.
Automatic, context-specific generation of Gene Ontology slims.
Davis MJ, Sehgal MS, Ragan MA.
PMID: 20929524
I hope this helps!
UPDATE:
http://www.geneontology.org/GO.slims.shtml
I hadn't realised that a lot of GO Slims are project/analysis specific, so you should take care to find the project specific method. Perhaps contact the authors directly.