We would like to select terms in Gene Ontology namespaces such that every class maps to a similar number of genes. GO slim categories contain too many categories for our purpose, namely visual comparison.
Example: We would like to breakdown the categories in "cellular component" to 12 terms. The standard GO slim annotation for yeast lists 24 terms, too many to be displayed in e.g. a histogram. Which terms need to be selected such that number of genes is balanced best?
A simple algorithmic solution to the problem - select in inner nodes by combined weight - is troubled by the assignment of genes to several terms, selection of suitable thresholds and partitioning problems.
This could problem sounds like it could have been solved but its publication drowned in those on enrichment analyses or the data structure does not allow for meaningful solutions in most cases.