Remove All Genes Associated With A Kegg Pathway From Analysis
2
5
Entering edit mode
14.4 years ago
Mike Dewar ★ 1.6k

I'm interested in the change in gene expression in T-Cells during the immune response. One way you can distinguish different functions of T-Cells is by which surface proteins they express at different times along the way. The trouble is that these surface proteins are such excellent discriminators that we use a small number of them to sort the cells out into different phenotypes. Hence, when I come along and apply my fancy differential gene expression tool to find out what's changing during the immune response all I find is a whole ton of other surface receptors that change with the phenotype.

This has a couple of problems: the first is that it's kind of boring - all I've discovered is that lots of surface receptors change during the immune response, which I already know. The second is that my bio colleagues are less interested in surface proteins as they're not very well conserved between organisms.

So I was thinking of removing all of the genes that are expressed in a couple of KEGG pathways (the cytokine-cytokine signalling and T-Cell receptor pathways) before I start my analysis and proceed. Two questions:

1) Is this a bad idea for some reason that I haven't realised? Are there more appropriate ways of filtering lists of genes before analysing expression?

2) Assuming it's an awesome, well-motivated-by-the-biology-and-the-data idea, are there any tools, preferably in bioconductor, that can give me a simple yes/no answer to the question: is this gene in either of these pathways?

gene kegg filter • 4.3k views
ADD COMMENT
0
Entering edit mode

Oh, I think question 2 has been answered already - if this is the case I'll just check it works and remove this bit from my original question. Sorry for missing it before asking!

ADD REPLY
0
Entering edit mode

I gave you some suggestions for question #2

ADD REPLY
4
Entering edit mode
14.4 years ago
Will 4.6k

If you're looking to remove genes that are "uninteresting" then I would suggest using Gene Ontology instead of Kegg Pathways. It tends to be more "complete" of a list then KEGG gene lists. I don't use bioconductor on a regular basis so I don't know of any library that will do this ... but I'm sure there must be something.

And removing genes before doing an analysis is a good way to help out your p-values since your background is smaller and more focused.

This is a pretty common thing to do in some GWAS analysis techniques. They remove SNPs that are not in coding regions, or near a gene, since any positive results will be "difficult to interpret".

In answer to question #2:

While I love parsing XML as much as the next programmer there's a much easier way to get all of the genes in a pathway. You can download the gene-ids (Entrez-IDS) for each pathway from their FTP site. Not sure which organism you're dealing with but each pathway has a .list file which lists all of the genes in the respective pathways.

ADD COMMENT
0
Entering edit mode

thanks will! Useful to know about the .list files...

ADD REPLY
0
Entering edit mode

Hi will, can this gene list be fetched via KEGG API/web services?

ADD REPLY
0
Entering edit mode

@Fengyuan: It probably can be retrieved from the KEGG SOAP but I find their design to be "unintuitive" (at least to me). I've always found that a simple wget of the FTP site gives me the results with less hassle but YMMV.

ADD REPLY
0
Entering edit mode
14.4 years ago

[?]

[?]

[?]

<?php
$myGeneId = "786";
if (in_array($myGeneId, $cytokinesCytokinesSignaling)) {
  //This gene belongs to the Pathway so do ......
}
else {
  //This gene does not belong to the Pathway so do ......
}

?>
ADD COMMENT

Login before adding your answer.

Traffic: 2902 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6