The vast majority of all somatic mutations in cancer are passenger mutations, with a small minority being driver mutations. Yes, a passenger mutation can disrupt the function of a protein, but this does not imply that it has a phenotypic effect that promotes tumor growth. For example, a nonsense mutation in an olfactory receptor may completely eliminate the protein, but, because olfactory receptors do not result in a phenotype relevant to cancer, the mutation is a passenger. Because of the large amount of cancer sequencing, a somatic mutation will have been observed at least once for all genes in the genome. So generally one needs to be more quantitative about what "a lot" of mutations means. There are various statistical methods to formally test whether the number of mutations in a gene is above that expected based on a background rate of mutations.
By your use of term "pathogenic" or "likely pathogenic", this makes me think you are using a database like clinvar, which is not necessarily very comprehensive for somatic mutations. So a first issue is you could try to annotate variants based on cancer-specific databases like OncoKB (https://www.oncokb.org/). However, an issue with all curated databases is that they will likely miss the majority of driver mutations because most driver mutations outside a few hotspots have not been studied in the literature. Thus, computational predictions of whether a mutation is a driver becomes the only practical option. I've previously developed a method CHASMplus to predict which missense mutations may be cancer drivers (https://chasmplus.readthedocs.io/en/latest/ ). You could annotate your variants with CHASMplus using OpenCRAVAT (https://opencravat.org/ ) by either submitting your vcf to the webserver or running a local copy. OpenCRAVAT also has other annotators that might help determine if a mutation is a driver, for example annotations on mutation "hotspots" etc.
Regarding whether a pathway is "disrupted" by mutations in cancer, you need to demonstrate that there is appreciable statistical evidence that many mutations in your pathway are indeed likely to be cancer drivers. Even better would be to then show that those driver mutations are associated with altered pathway activation as inferred by RNA-seq, etc. Depending on whether you are claiming a mechanistic argument, you might then need experimental evidence.
I would recommend to be very careful about claiming a passenger mutation disrupts a pathway, unless you are trying to make an argument about synthetic lethal interactions.
You can always show all somatic mutations in a plot. However, I might recommend that you color differently those that are known/predicted oncogenic versus those with highly uncertain significance.
I would even say that evidence by other NGS-like assays such as RNA-seq is not sufficient as you still do not know whether this drives any phenotype. At this point a statement about cancer relevance requires functional in vivo experiments, e.g. a mouse model with the required genetic alterations, or transplantation of pathway-pertubed ex vivo cells back into mouse recipients, and then monitor whether this induces/slows/accelerates cancer development/progression/relapse/resistance/... Without that it is all descriptive. Also try to be sure that these mutations are real, try some different variant callers and see whether the mutations still come up. Some validation by Sanger might make sense as well before investing heavy into functional experiments. Be careful with mutations that appear obvious but have not been reported yet. Sure, maybe you have a cool new finding and that'd be awesome, or you are working on some kind of false positives variants, maybe genes in regions that attract false alignments due to low complexity or any other sort of bias. Whatever you find, if you want to send a message beyond just describing something then you need functional in vivo validation.
Yes, I agree with your point that if a mechanistic argument is trying to be made, then experiments would be needed. As this is mostly a bioinformatics website, I didn't want to get into details, as the specifics would usually depend on the pathway. And, indeed, accurate variant calling is a must for downstream analysis.
That being said, human cancer genetics with carefully considered statistical models is perhaps the best way to implicate drivers of cancer. After all, many cancer drugs with a proven benefit in improving patient survival do target such mutated drivers. Since one can not perform experiments in vivo in humans, evidence in humans is necessarily observational. However, if you are careful about statistical analyses of somatic mutations, they can experimentally validate at high rates compared to most other technologies (a lot of my computational predictions have backing experimental evidence). I'm not claiming that you should use RNA-seq to assess "drivers", rather if you have a putative "driver" based on the statistical analysis of mutations, then at minimum a few things should be logically consistent. For example, is the "activity" of the pathway consistent with the status of the putative driver mutation? If not, then you may be looking a false positive. That doesn't prove causation but provides more evidence to justify experiments. I don't think jumping straight into in vivo validation is a good idea unless you have very convincing statistical evidence that is logically consistent.
I'll also note the reverse is also true. No matter how many experiments you perform in mice, that evidence is insufficient to make a claim of a causal driving role in HUMAN cancers. This is still true even if the gene/pathway expression identified in mouse is associated with patient prognosis, as happens quite frequently in such papers to justify "clinical relevance". As pointed out in a recent paper (https://www.biorxiv.org/content/10.1101/2021.06.01.446243v1 ), such prognostic analyses provide virtually no indication of a driver role in cancer.
Yes, this is a good point but for any mechanistic studies you need an in vivo model. I guess it all comes down to what OP wants to do and how this is planned to be published and what future plans are for the research. Elaborate stats are only possible if there is in fact clinical data of sufficient sample size to even allow these analysis you suggest.
Agree. I think we have discussed the benefits/pitfalls of various approaches sufficiently that the OP can decide what best fits their intended study.
Dear Collin,
Thanks for your complete answer