I am working with cosmic mutation data of 3,42,094 and I have found almost all the information about the genes. But I am not able to identify whether is it from NGS or even from micro array and there is no specification about this in the cosmic database. So, if anyone knows please help me. Thanks in advance and if any mistake sry for that.
COSMIC is manually curated, meaning that there are humans looking over published literature and making informed decisions about mutations to include(or not) based on the evidence provided in the publications. It is grand a mixture of somatic mutations derived from Sanger sequencing, allele-specific genotyping and PCR, and latest large-scale NGS efforts.
Manually input from peer reviewed publications by COSMIC expert curators
Consists of comprehensive literature curation of selected Census genes at release, followed by subsequent updates (Cancer Gene Census)
Includes additional data points relevant to each disease and publication
Provides accurate frequency data as mutation negative samples are specified
Also called non-systematic or targeted screen data
Genome-wide screen data
Uploaded from publications reporting large scale genome screening data or imported from other databases such as TCGA and ICGC
Provides unbiased molecular profiling of diseases while covering the whole genome
Provides objective frequency data by interpreting non mutant genes across each genome
Facilitates finding novel driver genes in cancer
You should also read the main publication linked to COSMIC, authored by cancer experts in the United Kingdom: A census of human cancer genes. In relation to this, COSMIC provide a 'Cancer Gene Census (CGC)' and divide their mutations into Tier1 and Tier2, with TIer1 having more evidence about having a role in cancer.
Thus, the conclusion is that, if you see a mutation in COSMIC, you can be assured that it has passed an experienced set of human eyes and that it has a role in cancer.
I think there needs to be more clarification of COSMIC and the Cancer gene census (CGC) to avoid over interpretation. The CGC is a manually curated list of genes containing mutations that have a driving role in cancer. COSMIC contains somatic mutations from cancer sequencing studies, as such the vast majority of mutations are passenger mutations. Although CGC genes will be substantially enriched for driver mutations, they also will contain a mixture of passenger and driver mutations. Like Kevin mentioned, mutations marked as "genome-wide screens" are from more comprehensive studies. However, if I remember correctly there is several 'gotchas' like discovery-prevalence screens (only sequenced a large subset of coding genes), studies which do not report silent mutations, and studies of pre/non-cancerous cysts.
as such the vast majority of mutations are passenger mutations.
Yes, that's a good point. Due to this, if I'm researching some rare disease that's entirely unrelated to cancer and find an interesting gene, some background reading on this gene will invariably result in the unearthing of a handful of published studies relating it to cancer, even if it may have no role in driving cancer.
Edit: I even said it to a top researcher in the cancer field recently, i.e., that people who believe that we can understand cancer better by just looking at mutation profiles are incorrect. We can continue building tabs on all mutations found in tumours but we're never making a huge push toward understanding the fundamental mechanisms that drive it, not that we need to do this either, because we've already identified the main risk factors that can result in prevention of cancer (obesity, smoking, alcohol, diet generally, etc.). No-one focuses on that.
I've been told from some prominent cancer geneticists that "if I can't find a link in a few steps of a gene to cancer than I haven't done enough background research on the gene yet." The point was the need for rigorous association based on data. The sequencing of cancers represent an observed endpoint of the natural evolutionary trajectory. This is good information for understanding what the drivers are because they in some sense should be disproportionately represented by definition drivers have a selective advantage. This doesn't necessarily give insight into how, why, or in what context. Although, I would contend that identifying the small proportion of drivers first is advantageous because then it is easier to disentangle contextual interactions or laborious mechanistic insight. I see different approaches as synergistic, one providing material for the other.
Of course passenger mutations may be informative such as about mutational signatures or potential neoantigens, but from the title of this post I inferred an emphasis on drivers.
Information about curation and FAQ.