Usually there are a lot of studies that have developed new methods to find candidate cancer driver genes. I have developed a new driver mutation detection algorithm. Is there any way to test my tool on some benchmarking datasets and compare it against various mutation detection algorithms already out there. I am interested only in gold standard driver mutation datasets. (Not driver gene datasets). Can you point me to any research articles or give some idea on how to test my new tool?
Based on what kind of data? Expression, histone marks, open chromatin?
Based on point mutations/INDELS/CNA etc. Basically the model that I have developed is based on the COSMIC mutation data. But the labels in COSMIC for each mutation (driver/passenger) is again based on the predictions of some tool (namely FATHMM). I want to test my model on known cancer driver mutations
I thought that we had already identified the driver genes behind the majority of cancers ... (?)
Are there databases that list the mutations (driver/passenger) in each of these known cancer driver genes? If so,are there studies that have taken these known driver/passenger mutations and listed their accuracy in identifying them? I want to compare my model against theirs
Perhaps, in this regard, one ought to consider the definition of what is a driver gene - I am yet to see a clear definition from a statistical standpoint, or anything that allows us to quantify / qualify a driver gene. Instead, the term 'driver' is used loosely to describe a gene that may be involved in cancer progression / promote tumourigenesis. Driver genes like TP53 are well known and documented and have clear roles in cancer progression. For most others, we have vast amounts of published data that shows their heightened expression in tumours; however, functional studies are required to prove each. Thus, even if you have developed some prediction algorithm, it is still in silico and will require functional validation, i.e., in the wet lab.
Thanks for your reply. So how do I find driver mutations/genes validated functionally inside a wet lab? Are there any resources?
I think this is not standardised in terms of databases that store these information. You would need to read papers and find information manually.