Extracting VCF of all variants in TCGA exomes from all cancer types
1
0
Entering edit mode
5.0 years ago
j.lunger18 ▴ 30

Hi. Ideally, I would like to get a single VCF file from all the exomes sequences that TCGA has from all cancer types. Even more ideally, I would do this for only a certain region in the genome. Is there any way to do this? I have GDC-client downloaded and loaded in the command line at the moment, but can only seem to find UUIDs for individual cancer types.

tcga vcf • 1.4k views
ADD COMMENT
0
Entering edit mode

If you are using R, you could consider looking at the GenomicDataCommons package to help facilitate finding and downloading datasets of interest. However, Kevin is correct that the MAFs are probably what you want. Note the the MAFs are filtered from the original variant files which are available only after obtaining dbGaP access.

ADD REPLY
1
Entering edit mode
5.0 years ago

TCGA VCF files are not available as open access - only MAF (mutation annotation format) files are available, and these can be downloaded from the GDC (Genomic Data Commons) Data Portal.

You can search for functions online about how to convert MAF to VCF, if that is definitely what you need.

If you keep everything as MAF, which is essentially tab-delimited format, then you can simply use shell commands to merge everything together. If you convert the data to multiple VCFs, then you can use BCFtools to merge them.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2145 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6