Tool:HIT'nDRIVE: Network based cancer driver genes prioritization algorithm using Hitting Time
0
0
Entering edit mode
7.4 years ago
raunakms ★ 1.1k

HIT'nDRIVE is a network based cancer driver gene prioritization algorithm. It is a combinatorial optimization method that integrates genomic changes with changes in transcriptome (expression outliers) to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the “random walk facility location” (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: “multi-hitting time”, the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. HIT'nDRIVE reduces RWFL problem to weighted multiset cover (WMSC) problem which it solves using integer linear programming (ILP).

This is a network propagation based algorithm (Read extensive review on the topic) that identifies the influential nodes in the network. It may have many potential applications in the area of complex disease research; beyond cancer. If anyone is interested in extensive application of our algorithm, we would be very happy to extend our help. Please feedback for any suggestions.

Software: https://github.com/sfu-compbio/hitndrive

Video [Tutorial talk]: https://youtu.be/29zZoYLDnwk (Presented at the UCLA Computational Genomics Summer Institute)

Video [seminar talk]: https://youtu.be/Ty3kgTCW9mQ

Slides [for the video above]: http://raunakms.github.io/pdf/HITnDRIVE_VanBUG_2016_11_03.pdf

Publications:

network driver-gene software • 2.9k views
ADD COMMENT
0
Entering edit mode

Thanks for sharing and am really interested in working on this.

I have a query: Can we able to get proteomics level information for those driver genes.

ADD REPLY
0
Entering edit mode

We have not used proteomics data yet. You can get proteomics data for the samples used in TCGA from CPTAC project https://cptac-data-portal.georgetown.edu/cptacPublic/

ADD REPLY
0
Entering edit mode

I have my own Genomics and Proteomics Data from the same patient sample. I should give a try with that and let you know.

ADD REPLY
0
Entering edit mode

I am by no means an expert in this field, but are there any similarities between the network propagation based algorithm to the MCL clustering algorithm? Both seems to be weighing nodes based on probable walk distances?

ADD REPLY
0
Entering edit mode

Seems there are no similarities between the network propagation based algorithm to the MCL clustering algorithm. I will be trying to get the cancer driver genes list from HIT'nDRIVE using genomics data.

Later, I have 2 options : 1. I shall convert those genes to frame translation and search with tandem mass spectra (ms2). or 2. I will try mapping (cancer driver genes) gene IDs to Protein IDs for the generation of protein database and search ms2 spectra against the generated cancer driver genes specific protein database.

ADD REPLY
0
Entering edit mode

In a very general sense, MCL clustering algorithm uses graph flow (i.e. network propagation) to cluster the graph.

In case of HIT'nDRIVE, we also use network propagation (random walk) on graph to measure (or quantify) distant interactions. Our main contribution is the use of distance measure - "Multi-hitting time". This is better explained in this video here: https://youtu.be/29zZoYLDnwk

ADD REPLY
0
Entering edit mode

You are right, I shall go through all the content and will get back to you.

Thanks

ADD REPLY
0
Entering edit mode

It seems the interesting details are in the RECOMB paper which is behind a paywall. Could you share this paper please ?
If I understand correctly, the algorithm is a form of clustering of a bipartite graph in which edge weights are derived from the hitting times computed from the protein interaction graph. Would that be a good description ?
Why/how did you choose to work with the hitting time ? The related commute time is easy to compute and leads to a kernel framework which would make it easy to add other data types to the protein interactions. Did you look into this ?

ADD REPLY
0
Entering edit mode

Yes, there could possibly be many approaches to solve the problem. We wanted to use maximum parsimony based combinatorial optimization approach. Here, we want to find the most parsimonious set of driver genes which can have maximum impact (i.e. cover) on a user defined proportion of expression-outlier genes. Clustering based approach may be less efficient to solve the problem. But you are most welcomed to try.

Hitting time and Commute time are entirely different. Hitting time is the distance traveled from source to the target for the first time. This much likely mimics the "biological signal" propagation, in cancerous system, from any given sequence altered driver gene (e.g mutated genes) to the expression-altered genes.

Commute time on the other hand is the distance traveled from source to the target and then from target back to the source. This, we believe, does not mimic how "biological signal" propagates. In cancer, the "biological signal" very less likely travels from a mutated gene to an expression-altered gene and back to the same mutated gene. Thus, using commute time makes less sense here.

RECOMB version can be accessed here https://www.researchgate.net/publication/260598036_HIT%27nDRIVE_Multi-driver_Gene_Prioritization_Based_on_Hitting_Time. But the recent Genome Research paper is a much improved version. Many details are in the supplementary.

ADD REPLY
0
Entering edit mode

Thanks for the link and pointing to the supplementary material. Commute time being the symmetric version of hitting time, I don't view them as entirely different. It's just that the symmetry usually makes things easier to compute and as I understand it, you derive the hitting time from the protein interactions network which is undirected. Since you use it to weight directed edges in a bipartite graph, it also makes sense to have an asymetric measure but I don't see it as a requirement. Using a kernel framework for the weighting scheme allows for easy integration of multiple data sources. Thanks also for sharing the code.

ADD REPLY

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6