Anyone familiar with ICGC's DCC data releases?
0
0
Entering edit mode
19 months ago

Hi all,

As part of my project, I wanted to find recurrent somatic mutations in cancer, but not only coding regions, I want to assess the whole genome for noncoding somatic mutations -> Thus I need WGS data.

My supervisor directed me to use data freely available on ICGC website (DCC data release 28 is the newest release, which includes WGS calls from PCAWG) -> https://dcc.icgc.org/releases/release_28/Projects

If I go to a specific cancer project (e.g. SKCM-US for melanoma), there's a file called "simple_somatic_mutation.open.SKCM-US.tsv.gz", which seem to be what I wanted.

However, upon looking at the actual data, some things did not add up:

  1. The number of SNVs from WGS is suspiciously small
  2. The number of SNVs called from WES are, on many occasions, larger than the number of SNVs called from WGS on the SAME SAMPLE, which does not make sense

Therefore my question is: Is the SSM data from the open access DCC data release complete? or does it remove a substantial amount of actual SNV calls (idk, maybe due to the fact that this is open access?)

If anyone is familiar with ICGC/TCGA, can you clarify this please?

Thank you so much !!

cancer TCGA PCAWG ICGC somatic • 394 views
ADD COMMENT

Login before adding your answer.

Traffic: 1155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6