Hi all
For a cancer research project some lab mates and myself are conducting, I would like to obtain lists of cancer mutations along with their variant allellic frequency (AF) for a wide range of malignancies. Looking at the TCGA, I noticed that the freely accessible MAF file format does not include this data field, but the TCGA VCF standard does list AF as an optional field in the INFO field. The VCF files are all under the restricted access tier, however.
Could someone with access to this data give me an indication of the percentage of TCGA malignancies that include AF in their mutation call files? Any pointers on other ways to find such data would also be appreciated highly!
Best,
Maarten
For future reference in this rather daunting task of finding pan-cancer MAFs, there is no standard column that stores VAF in TCGA MAFs (See this post) but rather there are varying column names between the GDACs that created the MAFs.
In addition to Sean's link, this page by the Broad seems to provide additional TCGA MAFs by different GDACs.
HGSC generated files include columns named:
TTotCov
TVarCov
NTotCov
NVarCov
Broad institute generated files include columns named:
t_alt_count
t_ref_count
The Broad's list above does not include Sanger MAFs, of which at least one (example here) includes the fields
n_ref_count
anda_ref_count
.Between those, almost all malignancies should be covered with a MAF that includes VAF. For those that seem outdated on the Broad's list (e.g. COAD, revised in 2013), the reference on the TCGA page is just as old, but I didn't check all malignancies here.
Do you have any idea about "i_TVarCov"? Sometimes there are two numbers like 19|18. What does this mean?
Thanks! I've been looking for MAFs that contain additional fields in Firehose and the TCGA data portal but no luck yet on finding any that include REF and ALT allele counts, will get back to you as soon as I'm successful.
try here:
https://wiki.nci.nih.gov/display/TCGA/TCGA+MAF+Files
Thanks once again! The UCSC produced MAF files I looked at indeed include REF and ALT allele counts. Is there a 1-to-1 correspondence between presence of this field in the MAF file and presence in the corresponding protected VCF file? In other words, would it be useful to apply for access to the protected VCF files nonetheless?
And I should have mentioned that ICGC maintains a much cleaner DCC than TCGA here: https://dcc.icgc.org/repository/icgc/current/
Try their .TSV files of somatic mutations. I believe they have allele counts for at least a subset of tumor types.