Finding a list of genes contained in a given pathway
2
1
Entering edit mode
10.3 years ago
erikfas ▴ 20

I'm currently doing RNA-seq, and one of the things I would like to do is to see what genes are differentially expressed in different pathways related to the EGFR network. The pathways I'm especially interested in are the MAPK/ERK, PI3K/AKT and JAK/STAT pathways. The problem is, for me, how do I define these pathways? What I would like to do is to get a gene list of all the genes in each of the pathways, so I can say "according to <source(s)>, the JAK/STAT pathway consists of <list of genes>". The idea is then to see how the different pathways differ in terms of gene expression between my various experimental parameters, i.e. taking the analysis down from a global scale to a pathway scale.

Getting such gene lists was, apparently, all too easy, since there are so many different sources to choose from - which is the problem. I've looked at some other questions here at BioStar (A: How To Get Snps Matrix For Population Genetic Ananlysis From Snps Variant Files, Gene Pathway Association File For Kegg, Extracting List Of Genes Associated With A Pathway In Kegg) and checked some of the web-based tools to do it (KEGG, GSEA). I now have quite a few lists of the three pathways to choose from, but they are quite different. I found lists from Biocarta, KEGG, GO annotation and PID, but not a single source had genes for all three pathways, at least not that I could find.

How would one go about solving this? Just picking one of the lists at random or arbitrarily seems... iffy. I'm sure I'm not the first one to have this issue. How have you / would you solve it? Thanks in advance!

MAPK/ERK PI3K/Akt JAK/STAT Pathway KEGG • 5.0k views
ADD COMMENT
0
Entering edit mode

The reasons for finding somewhat different lists is due to how broadly you want to both how broadly you want to define the pathway and what cell-type you're using in your definition. In the former case, keep in mind that pathways like this are artificial constructs and could actually include the entire all genes if you really wanted. In the latter case, it should be apparent that upstream regulators of things like MAPK are going to be completely different across cell-types.

ADD REPLY
0
Entering edit mode

Very true, thanks for pointing that out. I'm mainly interested in colorectal cancer and a few related cell lines (HCT-116, RKO, CACO-2, HKE3). I would prefer "smaller" pathways (i.e. somewhere around 20-50 genes). As far as I could see in my so far limited time with the various tools, there are no way to search for "colorectal cancer" in the same way that you can search for "homo sapiens", or am I missing it?

ADD REPLY
0
Entering edit mode

I'm not surprised that you can't find anything specific to colorectal cancer. Most of the databases won't explicitly state what the source cell-type is for a given piece of information. Yes, this makes pretty much any solution aside from going through the literature less than ideal. You might just compare the lists from a a number of sources and just include genes mentioned at least X times.

ADD REPLY
0
Entering edit mode

Ah... too bad. Do you have a favourite tool? Is there a tool that is purely based on literature, or do all of them have literature mining / computer-based backgrounds?

ADD REPLY
0
Entering edit mode

I don't have a favourite tool/database, unfortunately. The best one out their is probably IPA, from Ingenuity, but that's a commercial package. I only mention that as the likely best one since it incorporates manual curation by their staff. That's mostly for direct pathway analysis, though, and I don't know if you can get direct access to the underlying database for your needs.

ADD REPLY
0
Entering edit mode

Damn. Well, it would seem that the best way to do what I need is to just pick one of the pathways for whatever reason, state it, and then just go with it. Or do you have another idea?

ADD REPLY
3
Entering edit mode
10.3 years ago
Neilfws 49k

The TogoWS REST service comes up in some of the answers that you have already found at this site.

KEGG does contain a colorectal cancer pathway and this URI will retrieve the associated genes:

curl http://togows.dbcls.jp/entry/pathway/hsa05210/genes.json

Then it's a case of processing the JSON using your language of choice.

ADD COMMENT
0
Entering edit mode

Thanks! I did check that quickly before, actually, but since the gene lists that I get from using the URI is so much larger than the one I can see in the corresponding KEGG pathway (this http://www.genome.jp/kegg-bin/show_pathway?hsa04010 can't possibly have about 250 genes in it) I kind of gave up on it, just before asking this question. Why the discrepancy?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Yeah, the lists are all fine, but I mean in the actual KEGG pathway image. Maybe I just don't understand how they're drawn, but the MAPK pathway image has a little over a 100 genes in it (link in my previous comment).

ADD REPLY
0
Entering edit mode
10.3 years ago
EagleEye 7.6k
A: Gene Set Clustering based on Functional annotation (GeneSCF) You can download this tool and it has database as plain text format which contains KEGG, REACTOME and geneontology with their corresponding genes ( EntrezID and also Genesymbols). Let me know if you need any help.
ADD COMMENT
0
Entering edit mode

Thanks, but I'm not sure that this does what I'm looking for. If I understand your tool correctly, you give it a list of genes and it finds the enrichment/clustering of said genes in various pathways - correct? This is kind of the opposite of what I want, i.e. get a list of genes from a given pathway.

ADD REPLY
0
Entering edit mode

Yes but you do not have to use this tool .... There will be annotation folder in the tool. Where you will have All pathways with Gene list (Tab-separated format). It will have all the genes related to the pathways.

For example you will have PI3K-Akt signaling pathway with all related 347 genes listed. Just simple grepping on files will do - grep "PI3K-Akt signaling pathway".

https://github.com/santhilalsubhash/geneSCF/blob/master/annotation/KEGG_pathway_updated130711_geneSym.txt

https://github.com/santhilalsubhash/geneSCF/blob/master/annotation/KEGG_pathway_updated130711_geneID.txt

ADD REPLY

Login before adding your answer.

Traffic: 1315 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6