I'm currently doing RNA-seq, and one of the things I would like to do is to see what genes are differentially expressed in different pathways related to the EGFR network. The pathways I'm especially interested in are the MAPK/ERK, PI3K/AKT and JAK/STAT pathways. The problem is, for me, how do I define these pathways? What I would like to do is to get a gene list of all the genes in each of the pathways, so I can say "according to <source(s)>, the JAK/STAT pathway consists of <list of genes>". The idea is then to see how the different pathways differ in terms of gene expression between my various experimental parameters, i.e. taking the analysis down from a global scale to a pathway scale.
Getting such gene lists was, apparently, all too easy, since there are so many different sources to choose from - which is the problem. I've looked at some other questions here at BioStar (A: How To Get Snps Matrix For Population Genetic Ananlysis From Snps Variant Files, Gene Pathway Association File For Kegg, Extracting List Of Genes Associated With A Pathway In Kegg) and checked some of the web-based tools to do it (KEGG, GSEA). I now have quite a few lists of the three pathways to choose from, but they are quite different. I found lists from Biocarta, KEGG, GO annotation and PID, but not a single source had genes for all three pathways, at least not that I could find.
How would one go about solving this? Just picking one of the lists at random or arbitrarily seems... iffy. I'm sure I'm not the first one to have this issue. How have you / would you solve it? Thanks in advance!
The reasons for finding somewhat different lists is due to how broadly you want to both how broadly you want to define the pathway and what cell-type you're using in your definition. In the former case, keep in mind that pathways like this are artificial constructs and could actually include the entire all genes if you really wanted. In the latter case, it should be apparent that upstream regulators of things like MAPK are going to be completely different across cell-types.
Very true, thanks for pointing that out. I'm mainly interested in colorectal cancer and a few related cell lines (HCT-116, RKO, CACO-2, HKE3). I would prefer "smaller" pathways (i.e. somewhere around 20-50 genes). As far as I could see in my so far limited time with the various tools, there are no way to search for "colorectal cancer" in the same way that you can search for "homo sapiens", or am I missing it?
I'm not surprised that you can't find anything specific to colorectal cancer. Most of the databases won't explicitly state what the source cell-type is for a given piece of information. Yes, this makes pretty much any solution aside from going through the literature less than ideal. You might just compare the lists from a a number of sources and just include genes mentioned at least X times.
Ah... too bad. Do you have a favourite tool? Is there a tool that is purely based on literature, or do all of them have literature mining / computer-based backgrounds?
I don't have a favourite tool/database, unfortunately. The best one out their is probably IPA, from Ingenuity, but that's a commercial package. I only mention that as the likely best one since it incorporates manual curation by their staff. That's mostly for direct pathway analysis, though, and I don't know if you can get direct access to the underlying database for your needs.
Damn. Well, it would seem that the best way to do what I need is to just pick one of the pathways for whatever reason, state it, and then just go with it. Or do you have another idea?