Is there an over representation for cancer processes in pathway databases (e.g. IPA) - can this be reasonably adjusted?
1
3
Entering edit mode
8.2 years ago

After many years of pathway analysis with a range of public and proprietary tools (IPA, genego, GSEA, KEGG, GO etc), I have developed a suspicion that pathway enrichment statistics and results in general may be biased by the high representation of cancer related genes and pathways in the pathway databases. The question I have is twofold 1) Do others share the view that there is an inherent bias towards cancer in pathway databases? 2) Can (or should) this be corrected for, and if so how?

I've reviewed the literature on this and found nothing, but this is something that I have heard anecdotally on many occasions

thanks in advance for your thoughts

pathway cancer enrichment gsea go • 2.1k views
ADD COMMENT
1
Entering edit mode
  1. I completely share this view, though my personal favorites are "Parkinsons related" and "Alzheimers related", which inevitably mean there's a change in metabolic stress.
  2. It depends on the conclusions you're trying to draw. If you're trying to determine if pathway X is more perturbed than pathway Y then this is a big issue (no clue how to correct for this). If you're just trying to find some perturbed pathways so you can do some follow-up experiments then it's probably less of an issue.
ADD REPLY
1
Entering edit mode

Thanks Devon, yes I also recognise the points you make! Regarding conclusions it's probably both scenarios, and yes I completely agree with your assessment. I guess the issue that remains it that when to take your example - you report enrichment of PD and AD related genes, many (e.g. reviewers) will take this at face value, but to demonstrate that this represents metabolic stress requires a enrichment in a well annotated metabolic stress pathway to support this statement. Sometimes this may exist, sometimes not. I guess this underlines the importance of a strong gene ontology to compare disease results.

ADD REPLY
1
Entering edit mode
8.2 years ago
DG 7.3k

Yeah, it is more accurate to say that disease processes are over-represented. This isn't necessarily a bad thing, if those genes truly function together in pathways then they are still relevant biological pathways. As with most things in biology though, that is just how things are initially discovered. If these pathways are perturbed in your experiments all that you need to do is change your conceptualization, What are these pathways doing in normal healthy cells? What are they doing under different stress conditions? What are they doing in your experiment? The name of the pathway isn't necessarily that important and in fact once you see these pathways are effected you then have a huge amount of literature to draw on to put your results in their biological context.

ADD COMMENT
0
Entering edit mode

Thanks Dan, all good points. I think the salient point that emerges here is that disease processes and biological processes both need to be considered and cross compared to determine which biological processes are driving disease and vice versa to a lesser extent.

ADD REPLY

Login before adding your answer.

Traffic: 1764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6