Suppose I have a list of genes obtained from an experiment using mice. Now I have to perform an enrichment and I can choose to perform it using mouse GO terms, human GO terms, etc...
Is it ok to use human GO terms over mice genes?
If yes, then why bother creating specific mouse GO terms? I understand that mice may be used as a human model for some diseases but then...why are the two set of GO terms different?
If I can apply human GO terms over mice genes, should I perform enrichment analysis using zebrafish GO terms over mice genes (or human genes!), if those genes share a certain degree of similarity (such as homology)?
EDIT: I posted the same question in stack exchange bioinformatics but no answers there.
The MSigDB gene sets use human genes. However, many are based on mouse and rat studies (check the "organism" field for the individual gene sets). These were then converted to human symbols by the MSigDB team, so they consider the pathways to be sufficiently similar across those species.
So basically they just did a symbol transoformation from one organism to another without considering homology/orthology but just assuming that it is ok to do so, right? Is this meaningful? I suppose it can be since mouse is used as animal model but eventually things that work out for mice do not always work out for humans too hence it seems that using enrichment analysis is kind of...not reliable?
Some would argues that is the case even for the same species, but that would be an entirely separate topic.
can I ask if you have some references regarding the topic?
I cannot recall any particular publication.
A lot of pathways (for example MSigDB C2) are based on a single study. Although I do not doubt that many are reliable, we also know that many are not reproducible. Additionally, even in a well-executed study, most genes are usually not validated, either in an independent cohort or with an alternate computational analysis.
I agree, I would say that no matter the final strategy (if any) concerning pathway enrichment analysis, it is better to treat the results as indicative trends rather than demonstrated truths about any pathway being affected.
Ok, so basically whenever I need to perform an enrichment I should verify if there is a specific amount of similarity between the genes in the different organism. If this similarity is not significant enough I must assume that the enrichment will not provide meaningful results. Is that correct?
What I mean is that a possible strategy is to convert the human gene IDs into the other organism IDs, by retaining only those which are classified as homologous (using for example HUGO tables from the HCOP orthology tool, which incorporates info on many homology tools) and try to do the enrichment in those converted pathways. I think that this will work better with annotations such as GOs, because these are more general and encompass knowledge for different organisms. Other more human-specific annotations within MSigDB may work less well. This blog entry on the subject is interesting, and mentions how far-away organisms may behave worse. Nonetheless, as stated, I think it will depend on the type of gene sets you are using and how they are initially derived and the knowledge they contain.
It seems one of their conclusions is that the C2 gene sets are substantially more reliable than the well-curated Hallmark ones which is highly counter-intuitive.
You're right, I admit I skimmed through the post and had not noticed that it compared H sets to C2 sets. And it is indeed counter-intuitive. Maybe the top C2 significant sets are very big (many genes) and thus suffer less (significance-wise) from losing non-homologous genes than the top H significant sets, some of which may even enter the 15-gene threshold and thus are removed?
It's hard to say if they are actually removing n<15 and n>500 sets like they mention in the beginning, but when performing enrichment in C2 sets I usually see bias for big sets (>100-200 genes) at the top significant results.
ok, that is clear, stick to GO and verify similarity between genes. Also, thanks for the link!