Publicly available somatic variant calls for kidney cancer using WGS
2
2
Entering edit mode
9.6 years ago
tralynca ▴ 50

Good day,

Does anyone know where I can find published somatic mutation calls for kidney cancer by using whole genome sequencing and NOT whole exome sequencing. I need it for the non-coding portion of the genome. Preferably not TCGA because they have controlled access data and the somatic variants are mixed with germline mutations.

Thank you in advance,

Tracey

somatic-variants WGS kidney-cancer • 3.2k views
ADD COMMENT
0
Entering edit mode

Just to clarify for readers down the road, the TCGA somatic variants are not controlled-access. The BAM files, of course, are controlled-access, as will be the case for pretty much all human data. ALL studies using NGS will have somatic variants that are "contaminated" with germline variants, unfortunately; the extent will vary, of course, based on technical details.

ADD REPLY
0
Entering edit mode

Hi Sean,

Maybe I misunderstood, but the Data Levels and Data Types tab shows that the mutation files (whole genome and whole exome data) that are vcf and maf files (Level 2 data) are Controlled Access data (https://tcga-data.nci.nih.gov/tcga/tcgaDataType.jsp).

screenshot

ADD REPLY
0
Entering edit mode

You did ask about whole genome somatic variants. The exome somatic variants are available as somatic MAF files (but not the genomic somatic variants). That said, it is relatively straightforward to get access to the controlled-access data, so that really shouldn't stop your analysis.

ADD REPLY
0
Entering edit mode

Thanks for the feedback Sean. My supervisor is processing the request for the data. I was just hoping there was something else out there.

ADD REPLY
1
Entering edit mode
9.6 years ago

Hi Tracey, sorry for being very pessimistic. I think it would be difficult (if not impossible) as the recommended depth of coverage is around 500x to be able to make calls for detecting low allele frequencies as it is often the case for somatic mutations. Thus it is very unlikely that such a dataset where whole genomes were sequenced at this depth for these kinds of tumorous samples can be found nowadays. Let's consider 1000x on average to expect a 500x DC on most part of the genome (which is surely an underestimation of the sequencing effort needed):

Stating that you need to sequence:
1000x 3.4x10^9bp = 3.4x10^12 bp = 3400 Gb
and you have (for instance):
MiSeq output ~ 15Gb max
HiSeq 4000 output ~ 1500Gb max
=> 226 MiSeq runs / sample
=> 3 HiSeq 4000 runs / sample

I can't imagine if you needed a set of several samples (roughly at least 15 = 45 HiSeq 4000 runs) to ensure that you have a significant representation of variant calls to tell it is specific to kidney cancer.

ADD COMMENT
0
Entering edit mode

Thank you for your response Manu. Is that supposed to be 1000X or 100X because most articles state that 30-60X is sufficient for DC of WGS data?

ADD REPLY
1
Entering edit mode

I think Manu is just pointing out that, while 30-60x is what is typically done, for low allele frequency variants, a much higher depth is needed that what is typically done. Studies using 30-60x for somatic variant calling are very likely underpowered to detect somatic variants.

ADD REPLY
0
Entering edit mode

Makes sense. Thank you again Manu and Sean.

ADD REPLY
1
Entering edit mode
9.6 years ago

The ICGC has two whole-genome sequencing studies for renal cancer and renal cell cancer:

The data repository contains the somatic variants calls (SNVs and InDels, called simple somatic variants by ICGC) for the two studies. You should note the studies may have used different processing and variant calling pipelines. In general, the calls are saved as tab-delimited files, with additional metainformation regarding calling and genomic annotation. If you are only interested in non-coding variants, you can filter for variants with the respective attributes (e.g. those in intergenic regions).

Of course it depends very much on the question you want to address if these two studies are enough, but it should hopefully provide a good basis for your analysis.

ADD COMMENT
1
Entering edit mode

Hi Julian,

I meant to still get back at you and thank you for your suggestion. I ended up using the ICGC data for my project.

Tracey

ADD REPLY
0
Entering edit mode

I'm having a look at it now. Thank you Julian.

ADD REPLY

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6