I am currently working on an algorithm to distinguish driver from passenger mutations. I have a set of new genes that are like a validation set to me. After I have run the algorithm on these genes, I will have a set of predicted drivers and passengers. My question is, exactly when can I call a gene driver based on the number of driver and passenger mutation I get? Intuition tells me, I can call a gene driver even if it has a single driver mutation. Am I correct? Or is there any cutoff?
Thanks for your answer. The problem is I am working with mice samples and just have the vcf data with me and have annotated driver/passenger mutations using SIFT. So its very hard to validate the results. So can you also suggest some ways to validate this?I have a set of predicted driver/passenger mutations for each gene in the test data.
How have you annotated driver/passenger mutations using SIFT? - just going by the SIFT prediction? I am not sure that will be valid for any reputable journal, or even as a PhD thesis (if that is what you are doing). There are in silico prediction tools that are specifically tailoured for somatic variants. Take a look at the bottom, here: A: pathogenicity predictors of cancer mutations
Do you have a supervisor who is assisting you?
Thanks again for your help. I am concluding a mutation to be driver if it is 'deleterious' according to SIFT or passenger if it is 'tolerated'. So are you suggesting a consensus of the predictions from all these tools will be more useful? This is a summer project and my supervisor is not there to help me. So I am going through papers and Biostars posts to get an idea.
Okay, it is a summer project. Are you hoping to publish it or is it purely for training? You should at least try some of the other tools that I mentioned, if you can, in particular GWAVA and Funseq2. Using CADD is also generally good.
I think that it would be good to take a consensus, so, for example, choose 5 tools, and then require that a driver must have at least 3 of these predicting pathogenicity.
Thanks a lot for your patience. I will definitely do a consensus and look for a more pronounced set of drivers.