This question may stray to focus of Biostar a little bit. Still, I believe this regards both cancer research as well as technical bioinformatics.
There are numerous cancer driver genes / mutations that, by definition, frequently appear in certain cancer types and drive tumorigenesis. Many of them have been identified years ago.
However, New driver genes / mutations have continuously emerged from bioinformatic analysis in recent years. As far as my understanding, the most straightforward way to seek for cancer driver is through population frequency. So, is it just because we are able to sequence more tumor samples and obtain greater statistical power these days? Or these are other consideration in this cancer driver discovery field?. If sample size is the only factors here, does this mean that only large collaborative cancer project (with large sample size) is useful in terms of discovering new driver event, and study with smaller sample size stand few chance for that?
A minor question is: I guess to claim the discovery of new cancer events in study, there has be to at least some degrees of elucidation of how this event promotes cancer mechanistically instead of just saying that we observe a event with frequency beyond expectation, right?
What do you mean by "far from 80 percents for many cancer types"? Can you point me to a reference paper? Also, I did not find the PanCancer Atlas (Cell 2018) paper. Do you mind share some details, like PubMed ID or the title? Thanks
https://www.cell.com/consortium/pancanceratlas - here you go, there are 27 papers, but you need "Comprehensive Characterization of Cancer Driver Genes and Mutations".
There I speak about the power of detection.
If sample size is the only factors here, does this mean that only large collaborative cancer project (with large sample size) is useful in terms of discovering new driver event, and study with smaller sample size stand few chance for that?
This is a question about statistics, not about cancer drivers in particular, and the answer is strongly yes. The other thing is - cancer is a complex multi factor disease and may be thousand of genes make their impact. Are we interested in genes which promote tumor growth by 0.001 percent comparing to non mutated variant? So, increasing the cohort size does not mean detection of clinically meaningful drivers.
Good point. I guess the other angle is to come up with new methods that, within reasonable sample size, discover biological (mutational / expressional / epigenetic) oncogenic pattern.
That's for sure. We've learnt a lot, but nout even close to be enough. However, all these discoveries have to be validated using real organisms or cell lines, and this process is very, very slow. So we can develop new tools quite fast, but the validation of the findings may take tens of years.