Hey everyone,
I am trying to get my head around the COSMIC database. However, it just has so many files, that I don't really know, what they are all about. I want to have a look at variants of coding genes that appear with a specific VAF in it. I haven't found any information about the GT
s in it, so I cannot simply calculate it.
So my first question is, what are the different files? There is the Genome Screen Mutants VCF
, but also the Genome Screen Mutants
(a .tsv
file). They both should contain "coding point mutations from genome wide screens". What's the difference of these two file? They seem to contain overlapping, but not equal data. I checked the FAQ, but I didn't really find an answer to this. Then there is also the Cancer Mutation Census
, which should contain "all coding somatic mutations collected by COSMIC with biological and biochemical information from multiple sources". So, what is that now? Are these the same variants? Are they filtered for anything or are any added?
Second question: Is there any way to calculate the VAF of the different variants? I haven't found that info in the VCF, but there is also no genotype columns to calculate it. I have read people in publications often state things like "appeared 7 times in COSMIC". I guess this info I could also use instead of the VAF (even though they are of course not the same). But also this info I haven't found.
Maybe someone could give me some hints where to look :).
Thanks