How Do You Deal With Biological Context During Pathway Analysis?
2
4
Entering edit mode
12.9 years ago

An alternative way to phrase this question: How do you increase the quality (specificity & sensitivity) of pathway analysis results?

Our group often uses a pathway analysis approach to try to understand our data in more global terms. This has resulted in some interesting results in the past, so we keep at it. We may start with one of these 2 datasets:

1) gene expression data from an organ of interest (in our case, brain or microdissected part of the brain) at different developmental timepoints -- possibly with one key transcription factor knocked out, and compare to controls

2) gene content from de novo copy number variants from patients with a phenotype of interest

Both datasets result in a list of genes that are either (case 1) differentially expressed, or (case 2) present in altered copy number in patients. A typical next step is to ask pathway analysis to help identify other genes that may be involved in a given phenotype. I like to think this gives us a "holistic" approach, but sometimes am not so sure. Like many groups, I'll bet, we have paid for an Ingenuity license, but that often gives some humorous results, such as the time a colleague of mine kept getting back a pathway where multiple genes were related via interaction with "RNA polymerase II" (we suspect that when viewed this way, many many genes are related via interaction with RNA polymerase!) -- not so helpful.

But more frequently, the analysis algorithm provides some gene-gene relationship that is irrelevant to our particular organ of interest... ("That's nice, but those genes are never expressed in the brain...") Even more frequently, relationships are identified between genes, and we have no data to show they are ever expressed in the same cell type at the same time in development. This results in a large amount of manual curation, and we're left wondering what part of systems biology is actually automated! There are the "academic" pathway analysis tools, with algorithms such as PathwayCommons, String, GeneMania, VisANT, and others -- but these are also limited by some true relationships that have been demonstrated and published by multiple labs not showing up. The bottom line -- there are a lot of false negatives and false positives with this in silico approach. We turn to the wet-lab to sort out some of these leads, but can't chase them all down, and still need some way to identify the highest yield genes for further validation experiments.

How are labs dealing with the lack of biological context data in current pathway analysis algorithms? Are there other tools that people have developed internally?

We have been left with the conclusion that the best next step is building databases of organ-specific (and cell-type-specific) gene expression datasets from embryonic knock-downs in model animal systems - and then build a pathway analysis algorithm from the bottom up that will be more sensitive to biological context. This should improve specificity and sensitivity. I am starting to think that any pathway analysis algorithm that is not based on these kinds of organ-specific and timepoint-specific data is not so useful. Any other ideas?

Edit: Larry Parnell mentions DAPPLE in his answer below, so I've added a link to their site.

pathway analysis network • 4.3k views
ADD COMMENT
0
Entering edit mode

Which types of pathways/networks are you particularly interested in? Metabolic? Signalling? Protein-protein interaction networks?.. Could you make that clear? I think that the sort of analysis you can make differs significantly from one to another.

ADD REPLY
0
Entering edit mode

We are interested in discovering pathways/networks that mediate human disease phenotypes (mostly neurodevelopmental) - and can't often predict from the outset whether these are metabolic, signaling, other. Often we end up uncovering, say, transcription factor regulatory networks, or structural protein binding networks, but we go into the analysis hypothesis-free.

ADD REPLY
1
Entering edit mode
12.9 years ago

I agree with much of what you state and plan for future work. I also have seen such implausible links drawn between nodes/entities/genes. A "Homo sapiens pathway X" simply does not exist, and thus consideration of tissue and developmental (or other) time points is required such that pathway X with skeletal muscle, cardiac muscle and liver, etc as sub-types may be the way to depict the relationships among genes, proteins and small molecules.

I'd also add that proteomics data has an important role here because a protein imported from another site can have a vital function at that distant site.

You invoked systems biology and rightly so. That brings mathematical models of pathway relationships and fluxes into the discussion. One could have one canonical pathways where certain terms, in mathematical definition, are rendered at or near zero value because the pertinent nodes are not present/detected in that tissue at that time point. Such is ongoing with respect to early Drosophila embryogenesis, for example.

With regard to the RNA polymerase example, when I build my networks, I remove one of the seed nodes and rebuild the network, comparing to the "full" network. Lately, I do this work with DAPPLE. Each resulting iteration of the network, generated from dropping a different single seed node, is subjected to GO and other enrichment analysis in order to see if a given node gives redundant (ie, believable) interactors vs those unique to a given node.

Good question and thoughts, and here just a few of mine.

ADD COMMENT
0
Entering edit mode

Larry, thanks for your response and thoughts on this subject. You're right, proteomics can get left out, if our view of biology is too gene-centric. We also use the "leave-one-out" approach to help identify important nodes. And thanks for mentioning DAPPLE, which is available through the Broad Institute -- I look forward to running data through this tool.

ADD REPLY
1
Entering edit mode
12.1 years ago
xanderpico ▴ 580

I completely agree with your assessment of the limitations of canonical pathways. Check out WikiPathways. One of the main goals there is to break away from the canonical and support a more diverse set of pathways that are specific to a cell type, tissue, disease state, developmental stage, etc. That's a big task, but they already support the tagging of pathways with ontology terms to distinguish these different "flavors" of a given biological process. So, in effect, there should be not just one Apoptosis Pathway, but rather dozens (if not hundreds). Again, they are just starting to break out of the mold and there is plenty of work left to do, but the nice thing is that you (or anyone) can sign in and start making context-specific pathways today and immediately distribute and use them in your data visualizations and analyses.

http://wikipathways.org

And here's a presentation I gave on the WikiPathways model a few months ago:

http://www.slideshare.net/AlexanderPico/pico-ismb-2012-wikipathways

ADD COMMENT
0
Entering edit mode

Yes, WikiPathways is great. I even created a couple to add - or tried to. I never was able to establish a connection with the editing tool to put what I had onto the site.

ADD REPLY
0
Entering edit mode

That's a shame, since the whole point of WikiPathways is to make it easy to contribute and curate pathways. One alternative to their web editing tool it the full Java version of the same editor called PathVisio. You can then simply upload the .gpml file created in PathVisio to WikiPathways and avoid the web-based editor altogether.

ADD REPLY

Login before adding your answer.

Traffic: 1914 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6