Hi all.
I'm currently looking at mutation data from TCGA. I have downloaded .json files from the GDC portal and parsed the files to download specific subsets of files.
I'm rather confused because the primary tumor somatic mutation .vcf files from TCGA are in a different format than I'm used to. The headers are as follows:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
However, since I have indicated that I only want files from "primary tumor", what is this "NORMAL" column for?
I ask partly to understand, but partly because I want to eventually merge all the samples together downstream, and I'm not able to do so when all of the sample names look to be "NORMAL" and "TUMOR"...
Will variants be listed that are only in NORMAL? Or does this entirely depend on the tool that was used for variant calling? I'm interested in looking at sites only, and I want the variants to only be those found in tumors.
I do not believe this to be true; however, you can easily check, as such sites will have 0/0 for TUMOR.