Hi all,
As part of my project, I wanted to find recurrent somatic mutations in cancer, but not only coding regions, I want to assess the whole genome for noncoding somatic mutations -> Thus I need WGS data.
My supervisor directed me to use data freely available on ICGC website (DCC data release 28 is the newest release, which includes WGS calls from PCAWG) -> https://dcc.icgc.org/releases/release_28/Projects
If I go to a specific cancer project (e.g. SKCM-US for melanoma), there's a file called "simple_somatic_mutation.open.SKCM-US.tsv.gz", which seem to be what I wanted.
However, upon looking at the actual data, some things did not add up:
- The number of SNVs from WGS is suspiciously small
- The number of SNVs called from WES are, on many occasions, larger than the number of SNVs called from WGS on the SAME SAMPLE, which does not make sense
Therefore my question is: Is the SSM data from the open access DCC data release complete? or does it remove a substantial amount of actual SNV calls (idk, maybe due to the fact that this is open access?)
If anyone is familiar with ICGC/TCGA, can you clarify this please?
Thank you so much !!